Skip to content

Spotify Music Data

This dataset consists of ~600 songs that were in the top songs of the year from 2010 to 2019 (as measured by Billboard). You can explore interesting song data pulled from Spotify such as the beats per minute, amount of spoken words, loudness, and energy of every song.

Not sure where to begin? Scroll to the bottom to find challenges!

I have analysed the Spotify Dataset to gain insights on the information provided by the data. I have calculated and displayed summary statistics to better understand the data and also created graphs to understand the trend of different categories in the data. The goal here was to develop an understanding of the data we are working with (organizing the raw data into information) so that we can then further decide the steps and processes to answer more complex questions and build predictive models etc. on the data. A better understanding of the fundamentals of our dataset will ensure we can extract information with maximum efficiency.

import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
plt.style.use('ggplot')
import pandas as pd

df = pd.read_csv("spotify_top_music.csv", index_col=0)
df.head()
df.info()
df.describe()

Data dictionary

VariableExplanation
0titleThe title of the song
1artistThe artist of the song
2top genreThe genre of the song
3yearThe year the song was in the Billboard
4bpmBeats per minute: the tempo of the song
5nrgyThe energy of the song: higher values mean more energetic (fast, loud)
6dnceThe danceability of the song: higher values mean it's easier to dance to
7dBDecibel: the loudness of the song
8liveLiveness: likeliness the song was recorded with a live audience
9valValence: higher values mean a more positive sound (happy, cheerful)
10durThe duration of the song
11acousThe acousticness of the song: likeliness the song is acoustic
12spchSpeechines: higher values mean more spoken words
13popPopularity: higher values mean more popular

Source of dataset.

pop_artists = df[df['pop'] > 76][['artist','top genre','pop']]
pop_artists.sort_values(['pop'],ascending=False)
cols = list(df.columns[df.dtypes == int])
cols
yearly = df[cols].groupby('year').agg('mean')
yearly
plt.plot(yearly.index,yearly['bpm'], marker = 'o', mec = 'black',mfc = 'blue',c='black')
plt.title('Tempo by Year')
plt.ylabel('BPM (Beats Per Minute)')
plt.xlabel('Year')
plt.show()
plt.plot(yearly.index,yearly['nrgy'], marker = 'o', mec = 'black',mfc = 'blue',c='black')
plt.title('Energy by Year')
plt.xlabel('Year')
plt.ylabel('NRGY')
plt.show()
plt.plot(yearly.index,yearly['dnce'], marker = 'o', mec = 'black',mfc = 'blue',c='black')
plt.title('Danceability by Year')
plt.xlabel('Year')
plt.ylabel('DNCE')
plt.show()
plt.plot(yearly.index,yearly['dB'], marker = 'o', mec = 'black',mfc = 'blue',c='black')
plt.title('Loudness by Year')
plt.xlabel('Year')
plt.ylabel('Loudness (dB)')
plt.show()