Sleep Health and Lifestyle
This synthetic dataset contains sleep and cardiovascular metrics as well as lifestyle factors of close to 400 fictive persons.
The workspace is set up with one CSV file, data.csv, with the following columns:
Person IDGenderAgeOccupationSleep Duration: Average number of hours of sleep per dayQuality of Sleep: A subjective rating on a 1-10 scalePhysical Activity Level: Average number of minutes the person engages in physical activity dailyStress Level: A subjective rating on a 1-10 scaleBMI CategoryBlood Pressure: Indicated as systolic pressure over diastolic pressureHeart Rate: In beats per minuteDaily StepsSleep Disorder: One ofNone,InsomniaorSleep Apnea
Source: Kaggle
1. Data Cleaning and basic exploration
The first and the most important step.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import plotly.graph_objects as go
import plotly.figure_factory as ff
sns.set(palette="deep")
df = pd.read_csv('data.csv')
df.head()1.1 Information and missing values
df.info()df.isna().sum()1.2 Correct spelling
df['Occupation'].unique()df['Sleep Disorder'].unique()df['BMI Category'].unique()1.3 Warning about blood pressure and hear rate
df['Blood Pressure'].unique()Regarding Blood Pressure:
Blood pressure is measured using two numbers:
The first number, called systolic blood pressure, measures the pressure in your arteries when your heart beats.
The second number, called diastolic blood pressure, measures the pressure in your arteries when your heart rests between beats.
Depending the guideline, the value of a "correct" blood pressure may change. I will use the classic "(120/80)" value, but it is important to note that the dataset does not explain clearly if the Blood Pressure was a "one time" measurement or if it was performed many time to establish a diagnosis. Thefore I think this column should be use carefully.
The same comments could be done about Hear Rate, as a one time measurement does not mean anything.
df['Blood Pressure']=df['Blood Pressure'].apply(lambda x:0 if x in ['120/80','117/76','118/76','115/75'] else 1)df['Blood Pressure'].dtypes