Skip to content
Competition - Dance Party Songs
💾 The Data
You have assembled information on more than 125 genres of Spotify music tracks in a file called spotify.csv, with each genre containing approximately 1000 tracks. All tracks, from all time, have been taken into account without any time period limitations. However, the data collection was concluded in October 2022.
Each row represents a track that has some audio features associated with it.
| Column | Description |
|---|---|
track_id | The Spotify ID number of the track. |
artists | Names of the artists who performed the track, separated by a ; if there's more than one. |
album_name | The name of the album that includes the track. |
track_name | The name of the track. |
popularity | Numerical value ranges from 0 to 100, with 100 being the highest popularity. This is calculated based on the number of times the track has been played recently, with more recent plays contributing more to the score. Duplicate tracks are scored independently. |
duration_ms | The length of the track, measured in milliseconds. |
explicit | Indicates whether the track contains explicit lyrics. true means it does, false means it does not or it's unknown. |
danceability | A score ranges between 0.0 and 1.0 that represents the track's suitability for dancing. This is calculated by algorithm and is determined by factors like tempo, rhythm stability, beat strength, and regularity. |
energy | A score ranges between 0.0 and 1.0 indicating the track's intensity and activity level. Energetic tracks tend to be fast, loud, and noisy. |
key | The key the track is in. Integers map to pitches using standard Pitch class notation. E.g.0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1. |
loudness | The overall loudness, measured in decibels (dB). |
mode | The modality of a track, represented as 1 for major and 0 for minor. |
speechiness | Measures the amount of spoken words in a track. A value close to 1.0 denotes speech-based content, while 0.33 to 0.66 indicates a mix of speech and music like rap. Values below 0.33 are usually music and non-speech tracks. |
acousticness | A confidence measure ranges from 0.0 to 1.0, with 1.0 representing the highest confidence that the track is acoustic. |
instrumentalness | Instrumentalness estimates the likelihood of a track being instrumental. Non-lyrical sounds such as "ooh" and "aah" are considered instrumental, whereas rap or spoken word tracks are classified as "vocal". A value closer to 1.0 indicates a higher probability that the track lacks vocal content. |
liveness | A measure of the probability that the track was performed live. Scores above 0.8 indicate a high likelihood of the track being live. |
valence | A score from 0.0 to 1.0 representing the track's positiveness. High scores suggest a more positive or happier track. |
tempo | The track's estimated tempo, measured in beats per minute (BPM). |
time_signature | An estimate of the track's time signature (meter), which is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of 3/4, to 7/4. |
track_genre | The genre of the track. |
Source (data has been modified)
import pandas as pd
import pandas as pd
import matplotlib.pyplot as plt # Import matplotlib
import seaborn as sns
# Load the Spotify dataset
spotify = pd.read_csv('data/spotify.csv')
# Data Exploration
# Check for missing values
missing_values = spotify.isnull().sum()
# Data Visualization
# Visualize the distribution of danceability
plt.figure(figsize=(10, 6))
sns.histplot(spotify['danceability'], bins=30, kde=True)
plt.title('Distribution of Danceability')
plt.xlabel('Danceability')
plt.ylabel('Frequency')
plt.show()
# Rest of your analysis and code...