Premium project

Classify Song Genres from Audio Data

Rock or rap? Apply machine learning methods in Python to classify songs into genres.

Start Project
10 Tasks1,500 XP

Loved by learners at thousands of companies


Project Description

Using a dataset comprised of songs of two music genres (Hip-Hop and Rock), you will train a classifier to distinguish between the two genres based only on track information derived from [Echonest](http://the.echonest.com) (now part of Spotify). You will first make use of `pandas` and `seaborn` packages in Python for subsetting the data, aggregating information, and creating plots when exploring the data for obvious trends or factors you should be aware of when doing machine learning. Next, you will use the `scikit-learn` package to predict whether you can correctly classify a song's genre based on features such as danceability, energy, acousticness, tempo, etc. You will go over implementations of common algorithms such as PCA, logistic regression, decision trees, and so forth.

Project Tasks

  1. 1
    Preparing our dataset
  2. 2
    Pairwise relationships between continuous variables
  3. 3
    Normalizing the feature data
  4. 4
    Principal Component Analysis on our scaled data
  5. 5
    Further visualization of PCA
  6. 6
    Train a decision tree to classify genre
  7. 7
    Compare our decision tree to a logistic regression
  8. 8
    Balance our data for greater performance
  9. 9
    Does balancing our dataset improve model bias?
  10. 10
    Using cross-validation to evaluate our models

Technologies

Python Python

Topics

Data ManipulationData VisualizationMachine LearningImporting & Cleaning Data
Lina Tran Headshot

Lina Tran

PhD Candidate at University of Toronto

Lina studies learning and memory in the Frankland Lab at the Hospital for Sick Children/University of Toronto.
See More
Joel Östblom Headshot

Joel Östblom

PhD Candidate at University of Toronto

Joel is a PhD student in Biomedical Engineering at the University of Toronto, where he uses computational and experimental approaches to better understand fundamental stem cell decisions. Outside school, he enjoys playing ice hockey, eating and making food, being in nature, and figuring out how he can maximize the time he spends inside vim.
See More
Ahmed Hasan Headshot

Ahmed Hasan

PhD Candidate at University of Toronto

Ahmed Hasan is a PhD student in the Department of Cell and Systems Biology at the University of Toronto. An active user of both R and Python, his research focuses on understanding how genetic recombination affects how genomes evolve.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA