Skip to content
Spotify Music Data
  • AI Chat
  • Code
  • Report
  • Spotify Music Data

    This dataset consists of ~600 songs that were in the top songs of the year from 2010 to 2019 (as measured by Billboard). You can explore interesting song data pulled from Spotify such as the beats per minute, amount of spoken words, loudness, and energy of every song.

    Not sure where to begin? Scroll to the bottom to find challenges!

    import numpy as np
    import matplotlib.pyplot as plt
    import plotly.express as px
    plt.style.use('ggplot')
    import pandas as pd
    
    df = pd.read_csv("spotify_top_music.csv", index_col=0)
    df.head()

    Data dictionary

    VariableExplanation
    0titleThe title of the song
    1artistThe artist of the song
    2top genreThe genre of the song
    3yearThe year the song was in the Billboard
    4bpmBeats per minute: the tempo of the song
    5nrgyThe energy of the song: higher values mean more energetic (fast, loud)
    6dnceThe danceability of the song: higher values mean it's easier to dance to
    7dBDecibel: the loudness of the song
    8liveLiveness: likeliness the song was recorded with a live audience
    9valValence: higher values mean a more positive sound (happy, cheerful)
    10durThe duration of the song
    11acousThe acousticness of the song: likeliness the song is acoustic
    12spchSpeechines: higher values mean more spoken words
    13popPopularity: higher values mean more popular

    Source of dataset.

    df.info()
    df.describe()
    pop_artists = df[df['pop'] > 76][['artist','top genre','pop']]
    pop_artists.sort_values(['pop'],ascending=False)
    cols = list(df.columns[df.dtypes == int])
    cols
    yearly = df[cols].groupby('year').agg('mean')
    yearly
    plt.plot(yearly.index,yearly['bpm'], marker = 'o', mec = 'black',mfc = 'blue',c='black')
    plt.title('Tempo by Year')
    plt.ylabel('BPM (Beats Per Minute)')
    plt.xlabel('Year')
    plt.show()
    plt.plot(yearly.index,yearly['nrgy'], marker = 'o', mec = 'black',mfc = 'blue',c='black')
    plt.title('Energy by Year')
    plt.xlabel('Year')
    plt.ylabel('NRGY')
    plt.show()
    plt.plot(yearly.index,yearly['dnce'], marker = 'o', mec = 'black',mfc = 'blue',c='black')
    plt.title('Danceability by Year')
    plt.xlabel('Year')
    plt.ylabel('DNCE')
    plt.show()
    plt.plot(yearly.index,yearly['dB'], marker = 'o', mec = 'black',mfc = 'blue',c='black')
    plt.title('Loudness by Year')
    plt.xlabel('Year')
    plt.ylabel('Loudness (dB)')
    plt.show()
    plt.plot(yearly.index,yearly['live'], marker = 'o', mec = 'black',mfc = 'blue',c='black')
    plt.title('Liveness by Year')
    plt.xlabel('Year')
    plt.ylabel('Live')
    plt.show()
    plt.plot(yearly.index,yearly['dur'], marker = 'o', mec = 'black',mfc = 'blue',c='black')
    plt.title('Duration by Year')
    plt.xlabel('Year')
    plt.ylabel('Duration (seconds)')
    plt.show()