SleepInc: Helping you find better sleep π΄
π Background
Your client is SleepInc, a sleep health company that recently launched a sleep-tracking app called SleepScope. The app monitors sleep patterns and collects users' self-reported data on lifestyle habits. SleepInc wants to identify lifestyle, health, and demographic factors that strongly correlate with poor sleep quality. They need your help to produce visualizations and a summary of findings for their next board meeting! They need these to be easily digestible for a non-technical audience!
π Preview of data
import pandas as pd
raw_data = pd.read_csv('sleep_health_data.csv')
raw_dataπ Executive Summary
π― Aim:
To research which factors affect your sleeping.
π Method:
- Validate all data
- Make a Machine Learning model
- Assess how well it does
- Adjust it so it (hopefully) predicts everything right
- Look at the impact of each column (calculated by the model) on making predictions
- Make charts of the columns which have significance
- Double check if the Machine Learning model was right
- Form a conclusion
π Results:
According to the Machine Learning model and statistics...
To get better sleep, you should:
- Sleep longer (or go to bed earlier)
- Don't be stressed (try relax yourself)
- Be physically fit - do excercise (to have a low Resting Heart Rate)
- Do as much steps as possible in a day
- Older people sleep better (probably because they don't have to work because they are on a pension)
16 hidden cells
π€ Answering the challenge π
π Part 1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, accuracy_score
from sklearn.model_selection import GridSearchCVFirst,
we need to encode text columns so that our model will understand them.
for col in ['BMI Category', 'Sleep Disorder', 'BP_category']:
encoded = pd.get_dummies(raw_data[col])
raw_data[encoded.columns.to_list()] = encoded.valuesSecond,
we need to create training and testing sets. Training sets are what the ML model learns from, and the Testing set is what the ML model gets tested on.
X = raw_data.select_dtypes(exclude='object').drop('Quality of Sleep', axis=1)
y = raw_data['Quality of Sleep'].valuesX_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)Third,
We need to create a model without any adjustments, so we can improve it later, but also make it make predictions to measure the accuracy of its predictions.
β
β