Dance-Themed Playlist for Summer Party 🎶💃🕺
Background
It's that vibrant time of year again - Summer has arrived (for those of us in the Northern Hemisphere at least)! There's an energy in the air that inspires us to get up and move. In sync with this exuberance, your company has decided to host a dance party to celebrate. And you, with your unique blend of creativity and analytical expertise, have been entrusted with the crucial task of curating a dance-themed playlist that will set the perfect mood for this electrifying night. The question then arises - How can you identify the songs that would make the attendees dance their hearts out? This is where your coding skills come into play.
💾 The Data
We have assembled information on more than 125 genres of Spotify music tracks in a file called spotify.csv, with each genre containing approximately 1000 tracks. All tracks, from all time, have been taken into account without any time period limitations. However, the data collection was concluded in October 2022. Each row represents a track that has some audio features associated with it.
Let's get the party started!
Column | Description |
---|---|
track_id | The Spotify ID number of the track. |
artists | Names of the artists who performed the track, separated by a ; if there's more than one. |
album_name | The name of the album that includes the track. |
track_name | The name of the track. |
popularity | Numerical value ranges from 0 to 100 , with 100 being the highest popularity. This is calculated based on the number of times the track has been played recently, with more recent plays contributing more to the score. Duplicate tracks are scored independently. |
duration_ms | The length of the track, measured in milliseconds. |
explicit | Indicates whether the track contains explicit lyrics. true means it does, false means it does not or it's unknown. |
danceability | A score ranges between 0.0 and 1.0 that represents the track's suitability for dancing. This is calculated by algorithm and is determined by factors like tempo, rhythm stability, beat strength, and regularity. |
energy | A score ranges between 0.0 and 1.0 indicating the track's intensity and activity level. Energetic tracks tend to be fast, loud, and noisy. |
key | The key the track is in. Integers map to pitches using standard Pitch class notation. E.g.0 = C , 1 = C♯/D♭ , 2 = D , and so on. If no key was detected, the value is -1 . |
loudness | The overall loudness, measured in decibels (dB). |
mode | The modality of a track, represented as 1 for major and 0 for minor. |
speechiness | Measures the amount of spoken words in a track. A value close to 1.0 denotes speech-based content, while 0.33 to 0.66 indicates a mix of speech and music like rap. Values below 0.33 are usually music and non-speech tracks. |
acousticness | A confidence measure ranges from 0.0 to 1.0 , with 1.0 representing the highest confidence that the track is acoustic. |
instrumentalness | Instrumentalness estimates the likelihood of a track being instrumental. Non-lyrical sounds such as "ooh" and "aah" are considered instrumental, whereas rap or spoken word tracks are classified as "vocal". A value closer to 1.0 indicates a higher probability that the track lacks vocal content. |
liveness | A measure of the probability that the track was performed live. Scores above 0.8 indicate a high likelihood of the track being live. |
valence | A score from 0.0 to 1.0 representing the track's positiveness. High scores suggest a more positive or happier track. |
tempo | The track's estimated tempo, measured in beats per minute (BPM). |
time_signature | An estimate of the track's time signature (meter), which is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of 3/4 , to 7/4 . |
track_genre | The genre of the track. |
Source (data has been modified)
Objective:
Create a dance-themed playlist for the company's summer party using data analysis and predictions.
Approach:
To curate the ultimate dance party playlist, we will:
Explore audio features using descriptive statistics and data visualization techniques. Develop and apply a machine learning model to predict song danceability. Interpret model outcomes and utilize data-driven insights to select the top 50 songs for our playlist.
Executive Summary:
Our journey through this dataset illuminates the impact of genres, the influence of audio features, and the interplay between artists, albums, and tracks on the prediction and selection of danceable songs. We've explored the thematic dominance of holiday tracks, the danceability spectrum across genres, and the intricate relationship between audio features and danceability scores.
Findings:
Investigation across various genres showed that while the "kids" genre led in terms of danceability, it might not suit the mood and demographics of our adult audience. More suitable genres with high danceability, like "chicago-house," "latino," and "reggaeton," can be used to foster a dynamic party atmosphere.
Our analysis revealed correlation between specific audio features and danceability. High-energy songs consistently showed high danceability, while the danceability of acoustic music showed wide variability. Interestingly, both positive and negative emotions seem able to fuel danceability based on the track's other characteristics.
Model outcomes:
-
Our danceability prediction model, based on regression techniques, reveals the significant impact of attributes like genres, valence, tempo, and energy on danceability.
-
Diverse genre trends show Chicago House and Reggaeton consistently score high in danceability, reflecting a rich musical landscape.
-
The RandomForestRegressor is our chosen model, combining predictive accuracy with efficiency.
-
Feature importance scores emphasize genres, valence, tempo, and energy as primary influencers of danceability.
Recommendations:
-
Genre Mix and Danceability Score: We can opt to use a range of highly danceable genres such as "latino", "chicago-house", and "hip-hop" that our model predicts with high danceability scores.
-
Energy and Popular Hits: Incorporate high-energy tracks and well-liked songs into the playlist. These elements have shown to invoke a party atmosphere and cater to a diverse audience.
-
Smooth Transitions and Song Length: Strategically arrange shorter songs to maintain a lively flow and keep up the party's momentum.
-
Theme Options and Audience Engagement: Consider creating themed segments in your playlist for a unique atmosphere and remain open to crowd preferences and song requests to ensure a vibrant dance floor.
After excluding the kids and children genres, we have our first 10 predictions:
track_name | artists | track_genre | danceability | Prediction | |
---|---|---|---|---|---|
0 | Mek It Bunx Up | DeeWunn;Marcy Chin | dancehall | 2.1777886927 | 1.949149415 |
1 | First Class | Jack Harlow | hip-hop | 1.9390414473 | 1.9384069016 |
2 | Qué Más Pues? | J Balvin;Maria Becerra | latin | 1.933356989 | 1.9321765651 |
3 | Bad Ass Bitches | Wiz Khalifa | dance | 1.933356989 | 1.8753651366 |
4 | No Problem (From "Love Birds") | Apache Indian;A.R. Rahman | dancehall | 1.9902015713 | 1.8527485913 |
5 | Budget | Megan Thee Stallion;Latto | dance | 2.0356772371 | 1.84774203 |
6 | bury a friend | Billie Eilish | electro | 1.8367211992 | 1.8330853467 |
7 | Salió El Sol | Don Omar | hip-hop | 1.831036741 | 1.8327632042 |
8 | Tropicana | Boomdabash;Annalisa | reggae | 2.0640995282 | 1.8325329001 |
9 | Los Tres De Zacatecas | Banda Autentica de Jerez | r-n-b | 1.1375328377 | 1.831660739 |
These tracks promise an electrifying dance experience with their high danceability scores. For additional track predictions, please consult our end prediction section.
1. Let's start our analysis by validating our dataset first
In the initial phase of our analysis, we prioritize data validation. This crucial step involves addressing issues such as duplicated rows and missing values, ensuring the dataset's reliability and quality before proceeding with further analysis.
# Importing necessary Libraries
import pandas as pd
import numpy as np
import copy
import joblib
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msn
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
plt.style.use('seaborn-whitegrid')
Loading the dataset
- Data Overview
- View the first 5 rows of Original Dataset
spotify = pd.read_csv('data/spotify.csv')
spotify.head()
The dataset consists of 113,027 rows and 20 columns. Most columns have the correct data types. However, there are single missing values in 'artists,' 'album_name,' and 'track_name,' while the other variables have complete data.
Validating each categorical variable:
- Duplicate track_id entries need removal.
- Artists' names often appear multiple times, which is expected due to their multiple tracks and albums.
- Further investigation is required for album duplicates.
- Track name duplicates should also be addressed.
- Most tracks are non-explicit, which suits the theme of dance-oriented playlists.
- The dataset contains 114 unique track genres.
Validating numerical values