Skip to content

Sleep Health and Lifestyle

This synthetic dataset contains sleep and cardiovascular metrics as well as lifestyle factors of close to 400 fictive persons.

The workspace is set up with one CSV file, data.csv, with the following columns:

  • Person ID
  • Gender
  • Age
  • Occupation
  • Sleep Duration: Average number of hours of sleep per day
  • Quality of Sleep: A subjective rating on a 1-10 scale
  • Physical Activity Level: Average number of minutes the person engages in physical activity daily
  • Stress Level: A subjective rating on a 1-10 scale
  • BMI Category
  • Blood Pressure: Indicated as systolic pressure over diastolic pressure
  • Heart Rate: In beats per minute
  • Daily Steps
  • Sleep Disorder: One of None, Insomnia or Sleep Apnea

Source: Kaggle

1. Data Cleaning and basic exploration

The first and the most important step.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import plotly.graph_objects as go
import plotly.figure_factory as ff
sns.set(palette="deep")

df = pd.read_csv('data.csv')
df.head()
1.1 Information and missing values
df.info()
df.isna().sum()
1.2 Correct spelling
df['Occupation'].unique()
df['Sleep Disorder'].unique()
df['BMI Category'].unique()
1.3 Warning about blood pressure and hear rate
df['Blood Pressure'].unique()

Regarding Blood Pressure:

Blood pressure is measured using two numbers:

The first number, called systolic blood pressure, measures the pressure in your arteries when your heart beats.

The second number, called diastolic blood pressure, measures the pressure in your arteries when your heart rests between beats.

Depending the guideline, the value of a "correct" blood pressure may change. I will use the classic "(120/80)" value, but it is important to note that the dataset does not explain clearly if the Blood Pressure was a "one time" measurement or if it was performed many time to establish a diagnosis. Thefore I think this column should be use carefully.

The same comments could be done about Hear Rate, as a one time measurement does not mean anything.

df['Blood Pressure']=df['Blood Pressure'].apply(lambda x:0 if x in ['120/80','117/76','118/76','115/75'] else 1)
df['Blood Pressure'].dtypes