SleepInc: Helping you find better sleep 😴
📖 Background
Your client is SleepInc, a sleep health company that recently launched a sleep-tracking app called SleepScope. The app monitors sleep patterns and collects users' self-reported data on lifestyle habits. SleepInc wants to identify lifestyle, health, and demographic factors that strongly correlate with poor sleep quality. They need your help to produce visualizations and a summary of findings for their next board meeting! They need these to be easily digestible for a non-technical audience!
📂 Preview of data
import pandas as pd
raw_data = pd.read_csv('sleep_health_data.csv')
raw_data
📄 Executive Summary
🎯 Aim:
To research which factors affect your sleeping.
🛠 Method:
- Validate all data
- Make a Machine Learning model
- Assess how well it does
- Adjust it so it (hopefully) predicts everything right
- Look at the impact of each column (calculated by the model) on making predictions
- Make charts of the columns which have significance
- Double check if the Machine Learning model was right
- Form a conclusion
🏁 Results:
According to the Machine Learning model and statistics...
To get better sleep, you should:
- Sleep longer (or go to bed earlier)
- Don't be stressed (try relax yourself)
- Be physically fit - do excercise (to have a low Resting Heart Rate)
- Do as much steps as possible in a day
- Older people sleep better (probably because they don't have to work because they are on a pension)
🤖 Answering the challenge 📊
📕 Part 1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, accuracy_score
from sklearn.model_selection import GridSearchCV
First,
we need to encode text columns so that our model will understand them.
for col in ['BMI Category', 'Sleep Disorder', 'BP_category']:
encoded = pd.get_dummies(raw_data[col])
raw_data[encoded.columns.to_list()] = encoded.values
Second,
we need to create training and testing sets. Training sets are what the ML model learns from, and the Testing set is what the ML model gets tested on.
X = raw_data.select_dtypes(exclude='object').drop('Quality of Sleep', axis=1)
y = raw_data['Quality of Sleep'].values
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
Third,
We need to create a model without any adjustments, so we can improve it later, but also make it make predictions to measure the accuracy of its predictions.