Premium Project

Classify Song Genres from Audio Data

Rock or rap? Apply machine learning methods in Python to classify songs into genres.

Start Project
  • 10 tasks
  • 4,797 participants
  • 1,500 XP

Project Description

Using a dataset comprised of songs of two music genres (Hip-Hop and Rock), you will train a classifier to distinguish between the two genres based only on track information derived from Echonest (now part of Spotify). You will first make use of pandas and seaborn packages in Python for subsetting the data, aggregating information, and creating plots when exploring the data for obvious trends or factors you should be aware of when doing machine learning.

Next, you will use the scikit-learn package to predict whether you can correctly classify a song's genre based on features such as danceability, energy, acousticness, tempo, etc. You will go over implementations of common algorithms such as PCA, logistic regression, decision trees, and so forth.

Project Tasks

  • 1Preparing our dataset
  • 2Pairwise relationships between continuous variables
  • 3Normalizing the feature data
  • 4Principal Component Analysis on our scaled data
  • 5Further visualization of PCA
  • 6Train a decision tree to classify genre
  • 7Compare our decision tree to a logistic regression
  • 8Balance our data for greater performance
  • 9Does balancing our dataset improve model bias?
  • 10Using cross-validation to evaluate our models
Lina Tran

PhD Candidate at University of Toronto

Lina studies learning and memory in the Frankland Lab at the Hospital for Sick Children/University of Toronto.

See More
Joel Östblom

PhD Candidate at University of Toronto

Joel is a PhD student in Biomedical Engineering at the University of Toronto, where he uses computational and experimental approaches to better understand fundamental stem cell decisions. Outside school, he enjoys playing ice hockey, eating and making food, being in nature, and figuring out how he can maximize the time he spends inside vim.

See More
Ahmed Hasan

PhD Candidate at University of Toronto

Ahmed Hasan is a PhD student in the Department of Cell and Systems Biology at the University of Toronto. An active user of both R and Python, his research focuses on understanding how genetic recombination affects how genomes evolve.

See More


  • Python LogoPython
  • Topics

    Data ManipulationData VisualizationMachine LearningImporting & Cleaning Data