Skip to content
Sleep Health and Lifestyle
Sleep Health and Lifestyle
This synthetic dataset contains sleep and cardiovascular metrics as well as lifestyle factors of close to 400 fictive persons.
The workspace is set up with one CSV file, data.csv, with the following columns:
Person IDGenderAgeOccupationSleep Duration: Average number of hours of sleep per dayQuality of Sleep: A subjective rating on a 1-10 scalePhysical Activity Level: Average number of minutes the person engages in physical activity dailyStress Level: A subjective rating on a 1-10 scaleBMI CategoryBlood Pressure: Indicated as systolic pressure over diastolic pressureHeart Rate: In beats per minuteDaily StepsSleep Disorder: One ofNone,InsomniaorSleep Apnea
Source: Kaggle
🌎 Some guiding questions to help you explore this data:
- Which factors could contribute to a sleep disorder?
- Does an increased physical activity level result in a better quality of sleep?
- Does the presence of a sleep disorder affect the subjective sleep quality metric?
Exploratory overview
DataFrameas
df
variable
SELECT *
FROM 'data.csv'
LIMIT 5;DataFrameas
df1
variable
SELECT
COUNT(*) - COUNT("Person ID") AS null_person_id,
COUNT(*) - COUNT("Gender") AS null_gender,
COUNT(*) - COUNT("Age") AS null_age,
COUNT(*) - COUNT("Occupation") AS null_occupation,
COUNT(*) - COUNT("Sleep Duration") AS null_sleep_duration,
COUNT(*) - COUNT("Quality of Sleep") AS null_quality_of_sleep,
COUNT(*) - COUNT("Physical Activity Level") AS null_physical_activity_level,
COUNT(*) - COUNT("Stress Level") AS null_stress_level,
COUNT(*) - COUNT("BMI Category") AS null_bmi_category,
COUNT(*) - COUNT("Blood Pressure") AS null_blood_pressure,
COUNT(*) - COUNT("Heart Rate") AS null_heart_rate,
COUNT(*) - COUNT("Daily Steps") AS null_daily_steps,
COUNT(*) - COUNT("Sleep Disorder") AS null_sleep_disorder
FROM data.csv;We don´t have null values in the dataset
What kind of distribution do we have in the dataset?
DataFrameas
df13
variable
SELECT
"Age",
COUNT(*) AS Count
FROM data.csv
GROUP BY "Age"
ORDER BY "Age" ;We have a nice distribution for pacients of all ages
DataFrameas
df12
variable
SELECT
"Sleep Disorder",
"Gender",
COUNT(*) AS count
FROM data.csv
GROUP BY CUBE ("Sleep Disorder", "Gender")
ORDER BY "Sleep Disorder", "Gender";
Sleep Disorder summary:
Insomaia Total pacients: 77, Female: 36, Male: 41
None Total pacients: 219, Female: 82, Male: 137
Sleep Apnea Total pacients: 78, Female: 67, Male: 11
TOTAL PACIENTS: 374, Female: 185, Male: 189
Which factors could contribute to a Slep Disorder?
Analysis of numeric variables
DataFrameas
df3
variable
SELECT
COALESCE("Sleep Disorder", 'Total') AS "Sleep Disorder",
AVG("Sleep Duration") AS avg_sleep,
AVG("Quality of Sleep") AS avg_quality,
MAX("Sleep Duration") AS MAX_sleep,
MAX("Quality of Sleep") AS MAX_quality,
MIN("Sleep Duration") AS MIN_sleep,
MIN("Quality of Sleep") AS MIN_quality
FROM data.csv
GROUP BY ROLLUP ("Sleep Disorder");We can observe a decrease in the average duration of sleep and its quality, which is more accentuated in Insomnia.