Skip to content
0

Table of Contents:

  1. Introduction
  2. Importing Data and Libraries
  3. Data Cleaning, Exploring, and Visualization
  4. Feature Engineering
  5. Model Selection, Training, Validation
  6. Hyperparameter Tuning
  7. Feature Importance Extraction
  8. Final Model Deployment
  9. Dance-themed playlist Curation

1. Introduction:

Imagine a world where the beats of music come alive, where data and dance synchronize into an electrifying fusion. In this exhilarating musical journey, the power of data science was harnessed to curate a playlist that's not just a collection of songs, but a carefully crafted dance experience. The top 50 songs unveiled are meticulously chosen to make your heart race, your feet move, and your spirits soar. Welcome to the future of party playlists, where science meets the dance floor. Let's dive into the data and embark on a dance adventure!

2. Importing Data and Libraries

#import necessary libraries and modules for data exploration and analysis

import numpy as np
import pandas as pd
from datetime import datetime


#import necessary libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
%matplotlib inline
plt.style.use('seaborn-white')

#import necessary libraries and modules for machine learning
from sklearn.preprocessing import MinMaxScaler
from scipy.stats import boxcox, yeojohnson
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.linear_model import LinearRegression, Lasso, HuberRegressor, Ridge
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.inspection import permutation_importance
import pandas as pd
spotify = pd.read_csv('data/spotify.csv')
spotify

3. Data Cleaning, Exploring, and Visualization

# print the first 5 rows of the DataFrame
print(spotify.head())

# get information about the DataFrame
print(spotify.info())
# check for missing values
print(spotify.isna().sum())
# detect and count missing values
missing_values = spotify.isna().sum()

# plot missing values
ax = missing_values.plot(kind='bar', color=['gray' if val <= 0.05*len(spotify) else 'red' for val in missing_values], figsize=(8, 6))

# set plot labels and title
ax.set_xlabel('Columns')
ax.set_ylabel('Number of missing values')
ax.set_title('Missing Values in Hotels Dataset')

# show the plot
plt.show()
# Identify and print rows with missing values
print(spotify[spotify.isnull().any(axis=1)])
# Remove rows with missing values
spotify_c = spotify.dropna()
# Checks for duplicate rows and returns the number of duplicates.
spotify.duplicated().sum()

There are 444 exact duplicates whose rows contain the same values in all columns.

# Count occurrences of each unique value in the 'track_id' column
id_counts = spotify['track_id'].value_counts()

# Rename index and values columns
id_counts = id_counts.reset_index().rename(columns={'index': 'id', 'track_id': 'count'})

# Print the modified id_counts
print(id_counts)