CS 4641 Project

Introduction

In today’s world, the average consumer has access to millions of songs at the simple click of a button, a catalog that includes songs that span across eras, artistic expressions, and themes. While this presents music enthusiasts with an exciting opportunity to get more involved with their work, it also presents many challenges when it comes to organizing, classifying, and recommending songs to individuals with varied tastes.

Our team is impressed by the amount of research that has been done in this area, and wants to add to this literature. We have located several similar studies that each test different models, such as Convolutional Neural Networks (CNNs) [1] and Support Vector Machines [2]. We hope to compare and contrast more types of models to create a better genre classification model.

We plan to accomplish this by analyzing a dataset containing 1700 spectrograms derived from songs that are 270-300 seconds in length, and sorted into a hierarchy containing 3 levels of classifications and 16 distinct genres. Due to limitations in the dataset, our research will mainly be limited to English songs.

Problem Definition & Motivation

This project aims to classify the genres of songs based on their spectrograms by using the visual data and turning it into numeric data. We have three subtasks for classification: classical vs non-classical, sub-genres (classical: symphony, opera, etc., non-classical: pop, indie, rock, etc.), and sub-sub-genres (pop: teen pop, adult contemporary, etc.) We are motivated to work on this as this project can lead into helping to identify trends within genres, which would provide valuable insights for artists, labels, and consumers.

Methods

The most promising method is likely the use of Convolutional Neural Networks, a supervised learning method. They are very good at image recognition tasks, and since spectrograms are images that represent songs, CNNs may get good results [3]. We also plan on trying out traditional Machine Learning algorithms like SVMs and Random Forests. To utilize these methods, we would work to extract features from the spectrograms that could then be used as inputs into these supervised algorithms. Depending on the features we choose after some exploratory data analysis, we may also conduct PCA to reduce the dimensionality of our data and to boost performance. On the unsupervised learning front, we may gain some insights from trying various clustering algorithms to group similar spectrograms.

Potential Results

As briefly touched upon in the previous section, we believe that Convolutional Neural Networks will be able to classify songs with a high degree of accuracy [3]. We will likely find that the success rate of traditional Machine Learning algorithms highly depends on the features we select and create from the spectrograms, it doesn’t seem like it will be completely straightforward. To judge how good each method is, we’ll be looking at the following metrics.

References

  1. N. M R and S. Mohan B S, “Music Genre Classification using Spectrograms,” 2020 International Conference on Power, Instrumentation, Control and Computing (PICC), Thrissur, India, 2020, pp. 1-5, doi: 10.1109/PICC51425.2020.9362364.
  2. Costa, Yandre & Soares de Oliveira, Luiz & Koericb, A.L. & Gouyon, Fabien. (2011). Music genre recognition using spectrograms. Intl. Conf. on Systems, Signal and Image Processing. 1 - 4.
  3. M. Dong, ‘Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification’, CoRR, vol. abs/1802.09697, 2018.
  4. Zhaorui Liu and Zijin Li, “Music Data Sharing Platform for Computational Musicology Research (CCMUSIC DATASET).” Zenodo, Nov. 12, 2021. doi: 10.5281/ZENODO.5676893.

Dataset

Our dataset can be found here: https://huggingface.co/datasets/ccmusic-database/music_genre.

Proposed Timeline

This timeline is subject to change. A more detailed version can be found here.

TASK TITLETASK OWNERSTART DATEDUE DATE
Project Proposal   
Introduction & BackgroundJames DiPrimo9/27/202310/6/2023
Problem DefinitionAnirudh Ramesh9/27/202310/6/2023
MethodsSiddhant Dubey9/27/202310/6/2023
TimelineSoongeol Kang9/27/202310/6/2023
Potential Results & DiscussionJoseph Campbell9/27/202310/6/2023
Video RecordingSiddhant Dubey9/27/202310/6/2023
GitHub PageSiddhant Dubey9/27/202310/6/2023
Model 1 (GMM or K-means)   
Data Sourcing and CleaningJames DiPrimo10/7/202310/13/2023
Model SelectionAll10/13/202310/16/2023
Data Pre-ProcessingAnirudh Ramesh10/16/202310/23/2023
Model CodingSiddhant Dubey and Anirudh Ramesh10/23/202310/30/2023
Results Evaluation and AnalysisJoseph Campbell10/30/202311/2/2023
Midterm ReportEveryone10/31/202311/3/2023
Model 2 (CNN)   
Model CodingSiddhant Dubey, Joseph Campbell10/28/202311/4/2023
Results EvaluationSoongeol Kang11/5/202311/8/2023
AnalysisJames DiPrimo11/6/202311/9/2023
Model 3 (SVMs)   
Midterm ReportEveryone11/3/202311/11/2023
Model CodingSoongeol Kang, James DiPrimo11/11/202311/18/2023
Results EvaluationAnirudh Ramesh11/18/202311/21/2023
AnalysisJoseph Campbell11/19/202311/22/2023
Model 4 (Random Forests)   
Model CodingJames DiPrimo, Anirudh Ramesh11/15/202311/22/2023
Results EvaluationSiddhant Dubey11/20/202311/23/2023
AnalysisSoongeol Kang11/21/202311/24/2023
Evaluation   
Model ComparisonEveryone11/29/202112/4/2021
PresentationEveryone12/1/202312/6/2023
RecordingEveryone12/6/202112/7/2021
Final ReportEveryone12/2/202112/8/2021

Contribution Table

ContributionPerson
IntroductionJames
Problem StatementAnirudh
MethodsSiddhant
Potential ResultsJoseph
Proposed TimelineSoongeol
Finding DatasetsEveryone
Finding PapersEveryone