Project Proposal | CS 4641 Project

CS 4641 Project

Midterm Report
Final Report
Project Proposal

Introduction

In today’s world, the average consumer has access to millions of songs at the simple click of a button, a catalog that includes songs that span across eras, artistic expressions, and themes. While this presents music enthusiasts with an exciting opportunity to get more involved with their work, it also presents many challenges when it comes to organizing, classifying, and recommending songs to individuals with varied tastes.

Our team is impressed by the amount of research that has been done in this area, and wants to add to this literature. We have located several similar studies that each test different models, such as Convolutional Neural Networks (CNNs) [1] and Support Vector Machines [2]. We hope to compare and contrast more types of models to create a better genre classification model.

We plan to accomplish this by analyzing a dataset containing 1700 spectrograms derived from songs that are 270-300 seconds in length, and sorted into a hierarchy containing 3 levels of classifications and 16 distinct genres. Due to limitations in the dataset, our research will mainly be limited to English songs.

Problem Definition & Motivation

This project aims to classify the genres of songs based on their spectrograms by using the visual data and turning it into numeric data. We have three subtasks for classification: classical vs non-classical, sub-genres (classical: symphony, opera, etc., non-classical: pop, indie, rock, etc.), and sub-sub-genres (pop: teen pop, adult contemporary, etc.) We are motivated to work on this as this project can lead into helping to identify trends within genres, which would provide valuable insights for artists, labels, and consumers.

Methods

The most promising method is likely the use of Convolutional Neural Networks, a supervised learning method. They are very good at image recognition tasks, and since spectrograms are images that represent songs, CNNs may get good results [3]. We also plan on trying out traditional Machine Learning algorithms like SVMs and Random Forests. To utilize these methods, we would work to extract features from the spectrograms that could then be used as inputs into these supervised algorithms. Depending on the features we choose after some exploratory data analysis, we may also conduct PCA to reduce the dimensionality of our data and to boost performance. On the unsupervised learning front, we may gain some insights from trying various clustering algorithms to group similar spectrograms.

Potential Results

As briefly touched upon in the previous section, we believe that Convolutional Neural Networks will be able to classify songs with a high degree of accuracy [3]. We will likely find that the success rate of traditional Machine Learning algorithms highly depends on the features we select and create from the spectrograms, it doesn’t seem like it will be completely straightforward. To judge how good each method is, we’ll be looking at the following metrics.

Accuracy
F1 Score
Area under the ROC curve We will also visualize our results with a confusion matrix.

References

N. M R and S. Mohan B S, “Music Genre Classification using Spectrograms,” 2020 International Conference on Power, Instrumentation, Control and Computing (PICC), Thrissur, India, 2020, pp. 1-5, doi: 10.1109/PICC51425.2020.9362364.
Costa, Yandre & Soares de Oliveira, Luiz & Koericb, A.L. & Gouyon, Fabien. (2011). Music genre recognition using spectrograms. Intl. Conf. on Systems, Signal and Image Processing. 1 - 4.
M. Dong, ‘Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification’, CoRR, vol. abs/1802.09697, 2018.
Zhaorui Liu and Zijin Li, “Music Data Sharing Platform for Computational Musicology Research (CCMUSIC DATASET).” Zenodo, Nov. 12, 2021. doi: 10.5281/ZENODO.5676893.

Dataset

Our dataset can be found here: https://huggingface.co/datasets/ccmusic-database/music_genre.

Proposed Timeline

This timeline is subject to change. A more detailed version can be found here.

TASK TITLE	TASK OWNER	START DATE	DUE DATE
Project Proposal
Introduction & Background	James DiPrimo	9/27/2023	10/6/2023
Problem Definition	Anirudh Ramesh	9/27/2023	10/6/2023
Methods	Siddhant Dubey	9/27/2023	10/6/2023
Timeline	Soongeol Kang	9/27/2023	10/6/2023
Potential Results & Discussion	Joseph Campbell	9/27/2023	10/6/2023
Video Recording	Siddhant Dubey	9/27/2023	10/6/2023
GitHub Page	Siddhant Dubey	9/27/2023	10/6/2023
Model 1 (GMM or K-means)
Data Sourcing and Cleaning	James DiPrimo	10/7/2023	10/13/2023
Model Selection	All	10/13/2023	10/16/2023
Data Pre-Processing	Anirudh Ramesh	10/16/2023	10/23/2023
Model Coding	Siddhant Dubey and Anirudh Ramesh	10/23/2023	10/30/2023
Results Evaluation and Analysis	Joseph Campbell	10/30/2023	11/2/2023
Midterm Report	Everyone	10/31/2023	11/3/2023
Model 2 (CNN)
Model Coding	Siddhant Dubey, Joseph Campbell	10/28/2023	11/4/2023
Results Evaluation	Soongeol Kang	11/5/2023	11/8/2023
Analysis	James DiPrimo	11/6/2023	11/9/2023
Model 3 (SVMs)
Midterm Report	Everyone	11/3/2023	11/11/2023
Model Coding	Soongeol Kang, James DiPrimo	11/11/2023	11/18/2023
Results Evaluation	Anirudh Ramesh	11/18/2023	11/21/2023
Analysis	Joseph Campbell	11/19/2023	11/22/2023
Model 4 (Random Forests)
Model Coding	James DiPrimo, Anirudh Ramesh	11/15/2023	11/22/2023
Results Evaluation	Siddhant Dubey	11/20/2023	11/23/2023
Analysis	Soongeol Kang	11/21/2023	11/24/2023
Evaluation
Model Comparison	Everyone	11/29/2021	12/4/2021
Presentation	Everyone	12/1/2023	12/6/2023
Recording	Everyone	12/6/2021	12/7/2021
Final Report	Everyone	12/2/2021	12/8/2021

Contribution Table

Contribution	Person
Introduction	James
Problem Statement	Anirudh
Methods	Siddhant
Potential Results	Joseph
Proposed Timeline	Soongeol
Finding Datasets	Everyone
Finding Papers	Everyone