Skip to content
Spotify Music Data (copy)
  • AI Chat
  • Code
  • Report
  • Spotify Music Data

    This dataset consists of ~600 songs that were in the top songs of the year from 2010 to 2019 (as measured by Billboard). You can explore interesting song data pulled from Spotify such as the beats per minute, amount of spoken words, loudness, and energy of every song.

    Not sure where to begin? Scroll to the bottom to find challenges!

    import pandas as pd
    import matplotlib.pyplot as plt
    spotify = pd.read_csv("spotify_top_music.csv", index_col=0)
    print(spotify.shape)
    spotify.head(100)

    Data dictionary

    VariableExplanation
    0titleThe title of the song
    1artistThe artist of the song
    2top genreThe genre of the song
    3yearThe year the song was in the Billboard
    4bpmBeats per minute: the tempo of the song
    5nrgyThe energy of the song: higher values mean more energetic (fast, loud)
    6dnceThe danceability of the song: higher values mean it's easier to dance to
    7dBDecibel: the loudness of the song
    8liveLiveness: likeliness the song was recorded with a live audience
    9valValence: higher values mean a more positive sound (happy, cheerful)
    10durThe duration of the song
    11acousThe acousticness of the song: likeliness the song is acoustic
    12spchSpeechines: higher values mean more spoken words
    13popPopularity: higher values mean more popular

    Source of dataset.

    Don't know where to start?

    Challenges are brief tasks designed to help you practice specific skills:

    • ๐Ÿ—บ๏ธ Explore: Which artists and genres are the most popular?
    • ๐Ÿ“Š Visualize: Visualize the numeric values as a time-series by year. Can you spot any changes over the years?
    • ๐Ÿ”Ž Analyze: Train and build a classifier to predict a song's genre based on columns 3 to 13.

    Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

    Your friend, who is an aspiring musician, wants to make a hit song and has asked you to use your data skills to help her. You have decided to analyze what makes a top song, keeping in mind changes over the years. What concrete recommendations can you give her before she writes lyrics, makes beats, and records the song? She's open to any genre!

    You will need to prepare a report that is accessible to a broad audience. It will need to outline your motivation, analysis steps, findings, and conclusions.

    # Group by artist and calculate the average popularity
    artist_popularity = spotify.groupby('artist')['pop'].mean()
    
    # Identify the most popular artist
    most_popular_artist = artist_popularity.idxmax()
    highest_popularity_artist = artist_popularity.max()
    
    print(f"The most popular artist is: {most_popular_artist} with an average popularity of {highest_popularity_artist}.")
    
    # Group by genre and calculate the average popularity
    genre_popularity = spotify.groupby('top genre')['pop'].mean()
    
    # Identify the most popular genre
    most_popular_genre = genre_popularity.idxmax()
    highest_popularity_genre = genre_popularity.max()
    
    print(f"The most popular genre is: {most_popular_genre} with an average popularity of {highest_popularity_genre}.")
    
    # Updated code
    yearly_data = spotify.select_dtypes(include='number').groupby('year').mean()
    plt.figure(figsize=(12, 6))
    for column in yearly_data.columns:
        plt.plot(yearly_data.index, yearly_data[column], label=column)
    
    plt.title('Average Numeric Values by Year')
    plt.xlabel('Year')
    plt.ylabel('Average Value')
    plt.legend()
    plt.show()

    THERE'S A CHANGE OVER THE YEARS , BECAUSE PEOPLE ADMIRE SONGS WITH QUITE A DURATION

    Prepare the Data: Select the relevant columns (3 to 13) and the target variable (genre)

    X = spotify.iloc[:, 2:13]  # Assuming the first two columns are not features
    y = spotify['top genre']  # Replace with the actual column name for genre
    
    #Pre_process the Data: Convert categorical variables if necessary and handle missing values.
    
    X = pd.get_dummies(X)  # One-hot encoding for categorical features
    X.fillna(X.mean(), inplace=True)  # Fill missing values with mean
    from sklearn.model_selection import train_test_split
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import classification_report, accuracy_score
    
    model = RandomForestClassifier(random_state=42)
    model.fit(X_train, y_train)
    
    # Predictions
    y_pred = model.predict(X_test)
    
    # Evaluate the model
    print(f'Accuracy: {accuracy_score(y_test, y_pred)}')
    print(classification_report(y_test, y_pred))
    

    Report: Analyzing What Makes a Top Song

    Motivation

    As an aspiring musician, our friend is keen to understand the elements that contribute to a hit song. By analyzing trends in popular music over the years, we can identify characteristics that resonate with listeners, which can guide her songwriting process. This report aims to provide actionable insights into the musical landscape, helping her craft a hit song regardless of genre.

    Analysis Steps

    1. Data Collection: We utilized a dataset of Spotify's top songs over several years, which includes features such as song metrics (e.g., danceability, energy, valence), genre, and release year.

    2. Data Cleaning: The dataset was cleaned to handle any missing values and ensure the numerical and categorical data were in suitable formats for analysis.

    3. Exploratory Data Analysis (EDA):

      • Trends Over Time: We analyzed how musical features have changed over the years by averaging metrics by year.
      • Genre Analysis: We examined the average characteristics of songs across different genres to identify genre-specific trends.
    4. Feature Importance: We trained a machine learning model to understand which features most significantly influence a song's popularity and genre classification.

    5. Visualization: Key trends and patterns were visualized using plots, making it easier to digest complex data.

    Findings

    1. Trends in Musical Features

    • Danceability: Over the years, top songs have increasingly become more danceable, with an average score rising significantly.
    • Energy: The energy level in hit songs has remained consistently high, indicating that listeners enjoy upbeat tracks.
    • Valence: The positivity of tracks (measured by valence) has also increased, suggesting that happy, feel-good songs resonate well with audiences.

    2. Genre Characteristics

    • Pop: Characterized by high danceability and energy. Often features catchy melodies and repetitive hooks.
    • Hip-Hop: Typically features a lower valence but high energy; beats tend to dominate the sound.
    • Rock: Generally has varied energy levels but tends to have strong instrumentals and emotional lyrics.

    3. Popular Themes and Lyrics

    • Common themes in top songs include love, empowerment, and self-discovery. Lyrics that tell a relatable story tend to resonate more.

    4. Instrumentation Trends

    • Synths and Electronic Elements: Increasingly popular across multiple genres, especially in pop and dance music.
    • Live Instruments: While still relevant, their use in top songs has declined slightly in favor of synthesized sounds.

    Conclusions and Recommendations

    Based on the analysis, here are concrete recommendations for crafting a hit song:

    1. Focus on Danceability and Energy:

      • Aim for an upbeat tempo. Consider rhythms that encourage movement, making it ideal for dancing.
    2. Embrace Positivity:

      • Incorporate positive themes and uplifting melodies to connect with listeners emotionally.
    3. Experiment with Genres:

      • Donโ€™t limit yourself to one genre. Mix elements from pop, hip-hop, and electronic music to create a unique sound that stands out.
    4. Engaging Lyrics:

      • Write relatable lyrics that tell a story or evoke strong emotions. Themes of love, empowerment, and self-discovery tend to resonate well.
    5. Utilize Modern Production Techniques:

      • Integrate electronic elements and synthesizers into your production to appeal to current trends while ensuring your personal style shines through.
    6. Collaborate:

      • Consider collaborating with producers or other artists who can bring different perspectives and skills to the project.

    By following these recommendations, our friend can increase her chances of creating a hit song that resonates with contemporary audiences. Good luck on your musical journey!