Spotify Music Data

This dataset consists of ~600 songs that were in the top songs of the year from 2010 to 2019 (as measured by Billboard). You can explore interesting song data pulled from Spotify such as the beats per minute, amount of spoken words, loudness, and energy of every song.

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd
spotify = pd.read_csv("spotify_top_music.csv", index_col=0)
print(spotify.shape)
spotify.head(100)

#Identify Popular Artists and Genres
# Count the number of songs by each artist
popular_artists = spotify['artist'].value_counts()

# Count the number of songs by genre
popular_genres = spotify['top genre'].value_counts()

print("Most Popular Artists:\n", popular_artists.head(10))
print("Most Popular Genres:\n", popular_genres.head(10))

# Visualize Trends Over Time
import matplotlib.pyplot as plt
import seaborn as sns

# Ensure that only numeric columns are used for calculating mean values
numeric_columns = spotify.select_dtypes(include='number').columns
yearly_trends = spotify.groupby('year')[numeric_columns].mean()

# Visualizing energy, valence, and danceability over time
plt.figure(figsize=(12, 6))
sns.lineplot(data=yearly_trends[['nrgy', 'val', 'dnce']], markers=True)
plt.title("Yearly Trends in Energy, Valence, and Danceability")
plt.ylabel("Average Value")
plt.xlabel("Year")
plt.legend(['Energy', 'Valence', 'Danceability'])
plt.show()

#Predict Song Genre Using a Classifier
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Preparing the dataset
X = spotify.iloc[:, 3:13]  # Features
y = spotify['top genre']   # Target variable

# Splitting the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Training the model
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Evaluating the model
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

Data dictionary

	Variable	Explanation
0	title	The title of the song
1	artist	The artist of the song
2	top genre	The genre of the song
3	year	The year the song was in the Billboard
4	bpm	Beats per minute: the tempo of the song
5	nrgy	The energy of the song: higher values mean more energetic (fast, loud)
6	dnce	The danceability of the song: higher values mean it's easier to dance to
7	dB	Decibel: the loudness of the song
8	live	Liveness: likeliness the song was recorded with a live audience
9	val	Valence: higher values mean a more positive sound (happy, cheerful)
10	dur	The duration of the song
11	acous	The acousticness of the song: likeliness the song is acoustic
12	spch	Speechines: higher values mean more spoken words
13	pop	Popularity: higher values mean more popular

Source of dataset.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

🗺️ Explore: Which artists and genres are the most popular?
📊 Visualize: Visualize the numeric values as a time-series by year. Can you spot any changes over the years?
🔎 Analyze: Train and build a classifier to predict a song's genre based on columns 3 to 13.

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

Your friend, who is an aspiring musician, wants to make a hit song and has asked you to use your data skills to help her. You have decided to analyze what makes a top song, keeping in mind changes over the years. What concrete recommendations can you give her before she writes lyrics, makes beats, and records the song? She's open to any genre!

You will need to prepare a report that is accessible to a broad audience. It will need to outline your motivation, analysis steps, findings, and conclusions.

Spotify Music Data

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Spotify Music Data

Data dictionary

Don't know where to start?

Spotify Music Data