Skip to content
Competition - Dance Party Songs
0
  • AI Chat
  • Code
  • Report
  • Which songs are most suitable for a dancing party?

    📖 Background

    It's that vibrant time of year again - Summer has arrived (for those of us in the Northern Hemisphere at least)! There's an energy in the air that inspires us to get up and move. In sync with this exuberance, your company has decided to host a dance party to celebrate. And you, with your unique blend of creativity and analytical expertise, have been entrusted with the crucial task of curating a dance-themed playlist that will set the perfect mood for this electrifying night. The question then arises - How can you identify the songs that would make the attendees dance their hearts out? This is where your coding skills come into play.

    💾 The Data

    You have assembled information on more than 125 genres of Spotify music tracks in a file called spotify.csv, with each genre containing approximately 1000 tracks. All tracks, from all time, have been taken into account without any time period limitations. However, the data collection was concluded in October 2022. Each row represents a track that has some audio features associated with it.

    ColumnDescription
    track_idThe Spotify ID number of the track.
    artistsNames of the artists who performed the track, separated by a ; if there's more than one.
    album_nameThe name of the album that includes the track.
    track_nameThe name of the track.
    popularityNumerical value ranges from 0 to 100, with 100 being the highest popularity. This is calculated based on the number of times the track has been played recently, with more recent plays contributing more to the score. Duplicate tracks are scored independently.
    duration_msThe length of the track, measured in milliseconds.
    explicitIndicates whether the track contains explicit lyrics. true means it does, false means it does not or it's unknown.
    danceabilityA score ranges between 0.0 and 1.0 that represents the track's suitability for dancing. This is calculated by algorithm and is determined by factors like tempo, rhythm stability, beat strength, and regularity.
    energyA score ranges between 0.0 and 1.0 indicating the track's intensity and activity level. Energetic tracks tend to be fast, loud, and noisy.
    keyThe key the track is in. Integers map to pitches using standard Pitch class notation. E.g.0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.
    loudnessThe overall loudness, measured in decibels (dB).
    modeThe modality of a track, represented as 1 for major and 0 for minor.
    speechinessMeasures the amount of spoken words in a track. A value close to 1.0 denotes speech-based content, while 0.33 to 0.66 indicates a mix of speech and music like rap. Values below 0.33 are usually music and non-speech tracks.
    acousticnessA confidence measure ranges from 0.0 to 1.0, with 1.0 representing the highest confidence that the track is acoustic.
    instrumentalnessInstrumentalness estimates the likelihood of a track being instrumental. Non-lyrical sounds such as "ooh" and "aah" are considered instrumental, whereas rap or spoken word tracks are classified as "vocal". A value closer to 1.0 indicates a higher probability that the track lacks vocal content.
    livenessA measure of the probability that the track was performed live. Scores above 0.8 indicate a high likelihood of the track being live.
    valenceA score from 0.0 to 1.0 representing the track's positiveness. High scores suggest a more positive or happier track.
    tempoThe track's estimated tempo, measured in beats per minute (BPM).
    time_signatureAn estimate of the track's time signature (meter), which is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of 3/4, to 7/4.
    track_genreThe genre of the track.

    Source (data has been modified)

    #Some initial tests
    from scipy.stats import pearsonr
    
    corr, _ = pearsonr(spotify['tempo'], spotify['danceability'])
    print('Pearsons correlation: %.3f' % corr)
    
    import pandas as pd
    spotify = pd.read_csv('data/spotify.csv')
    spotify

    Removing duplicates, speech and missing data

    # I'm keeping the artist and track name only, because some artists have the same song in multiple albums. (e.g. "Are you gonna be my girl" by Jet is in the albums Get Born and Timeless Rock hits). I can't only keep the track name because there can be two different songs with the same name 
    dupl_criteria = ['track_name', 'artists'] # maybe  'album_name' (gives 21.66%)
    
    # I want to choose duplicate track that has a higher popularity.
    # Is there a way I can add them? It's complicated because it is calculated also with hits over time. 
    spotify_popularity = spotify.sort_values(['popularity'], ascending = False)
    dupl = spotify_popularity.duplicated(subset=dupl_criteria, keep=False)
    print(round(100*sum(spotify_popularity.duplicated(subset=dupl_criteria))/len(spotify_popularity.index),2), '% of tracks are duplicates by these criteria: ', str(dupl_criteria))
    
    spotify_nodpl = spotify_popularity.drop_duplicates(dupl_criteria).sort_index()
    
    # Removing speech. Since there is not a lot of these, I'll remove all below 0.33
    speech_criterion = 0.33
    spotify_nospch = spotify_nodpl[spotify_nodpl["speechiness"] <= speech_criterion]
    print(round(sum(spotify_nodpl["speechiness"] > speech_criterion)/len(spotify_nospch.index),2), '% of tracks are speech, with speechiness > ', speech_criterion)
    
    
    # Removing missing data. 
    # There seems to be one track with a missing name, artist and album. Since we cannot guess if it is the same as any other track, we will drop it 
    
    #print(spotify_nospch.isna().sum())
    spotify_clean = spotify_nospch.dropna()
    print('Now we are left with {:.2f}% of the initial sample.'.format(len(spotify_clean.index)/len(spotify.index)*100))
    
    spotify_clean
    spotify_clean.isna().sum()
    spotify.isna().sum()
    import matplotlib.pyplot as plt
    plt.scatter(spotify['track_genre'], spotify['danceability'])
    plt.show()
    import numpy as np
    spotify.pivot_table(values="danceability", index="track_genre", aggfunc=[np.mean, np.median, np.max, np.min, np.std])
    (sum(spotify.groupby("track_genre")["danceability"].mean()>=0.5)/114)*100

    💪 Challenge

    Your task is to devise an analytically-backed, dance-themed playlist for the company's summer party. Your choices must be justified with a comprehensive report explaining your methodology and reasoning. Below are some suggestions on how you might want to start curating the playlist:

    • Use descriptive statistics and data visualization techniques to explore the audio features and understand their relationships.
    • Develop and apply a machine learning model that predicts a song's danceability.
    • Interpret the model outcomes and utilize your data-driven insights to curate your ultimate dance party playlist of the top 50 songs according to your model.

    🧑‍⚖️ Judging criteria

    CATEGORYWEIGHTINGDETAILS
    Recommendations35%
    • Clarity of recommendations - how clear and well presented the recommendation is.
    • Quality of recommendations - are appropriate analytical techniques used & are the conclusions valid?
    • Number of relevant insights found for the target audience.
    Storytelling35%
    • How well the data and insights are connected to the recommendation.
    • How the narrative and whole report connects together.
    • Balancing making the report in-depth enough but also concise.
    Visualizations20%
    • Appropriateness of visualization used.
    • Clarity of insight from visualization.
    Votes10%
    • Up voting - most upvoted entries get the most points.

    ✅ Checklist before publishing into the competition

    • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
    • Remove redundant cells like the judging criteria, so the workbook is focused on your story.
    • Make sure the workbook reads well and explains how you found your insights.
    • Try to include an executive summary of your recommendations at the beginning.
    • Check that all the cells run without error.