Problem Statement
Sleep quality and duration is difficult to come by as individuals age due to a wide variety of circumstances. Using the Sleep, Health and Lifestyle Table, we can explore a variety of health, environmental and demographic factors that play different roles in affecting an individual's sleep.
This analysis will aim to pinpoint which factors play the largest roles in affecting a person's sleep quality and sleep duraration. We will analyze the correlation between sleep duration, sleep quality and the other numeric variables in this set. For the categorical variables, we will analyze summary statistics partitioned by those variables to see if any significant differences exist. Finally, with sleep disorders being a variable in the table, we will determine if the presence of a sleep disorder serves as a confounding variable in any correlations we examine, as well as the factors that are highly linked with the presence of a sleep disorder.
-- Explore the data in the table
SELECT *
FROM 'Sleep_health_and_lifestyle_dataset.csv'
LIMIT 5Cleaning the Data
-- Rename columns to fit SQL style
-- Merge 'Normal' and 'Normal Weight' under the BMI Category field
SELECT
"Person ID" AS person_id,
Gender AS gender,
Age AS age,
Occupation AS occupation,
"Sleep Duration" AS sleep_duration,
"Quality of Sleep" AS quality_sleep,
"Physical Activity Level" AS activity_level,
"Stress Level" AS stress_level,
REGEXP_REPLACE("BMI Category", ' Weight', '') AS bmi_category,
"Sleep Disorder" AS sleep_disorder
FROM "Sleep_health_and_lifestyle_dataset.csv"Question 1: How much do sleep duration and sleep quality correlate with each other?
-- Correlation between sleep duration and sleep quality
SELECT corr(sleep_duration, quality_sleep) AS duration_quality_correlation
FROM cleaned_sleepQuestion 2: How do sleep duration and sleep quality vary based on gender?
-- Summary stats (average, median, standard deviation) for sleep duration and sleep quality split by gender
SELECT
gender,
COUNT(*) AS total,
ROUND(AVG(sleep_duration),2) AS avg_duration_gender,
ROUND(AVG(quality_sleep),2) AS avg_quality_gender,
percentile_cont(0.5) WITHIN GROUP (ORDER BY sleep_duration) AS median_duration_gender,
percentile_cont(0.5) WITHIN GROUP (ORDER BY quality_sleep) AS median_quality_gender,
ROUND(STDDEV(sleep_duration),2) AS sd_duration_gender,
ROUND(STDDEV(quality_sleep),2) AS sd_quality_gender
FROM cleaned_sleep
GROUP BY genderQuestion 3: How do sleep duration and sleep quality vary based on occupation?
-- Summary stats (average, median, standard deviation) for sleep duration and sleep quality split by occupation.
-- Only occupations with at least 5 respondents were included. Those excluded were Scientist (4), Software Engineer (4), Sales Representative (2) and Manager (1).
SELECT
occupation,
COUNT(*) AS total,
ROUND(AVG(age),2) AS avg_age,
ROUND(AVG(sleep_duration),2) AS avg_duration_job,
ROUND(AVG(quality_sleep),2) AS avg_quality_job,
percentile_cont(0.5) WITHIN GROUP (ORDER BY sleep_duration) AS median_duration_job,
percentile_cont(0.5) WITHIN GROUP (ORDER BY quality_sleep) AS median_quality_job,
ROUND(STDDEV(sleep_duration),2) AS sd_duration_job,
ROUND(STDDEV(quality_sleep),2) AS sd_quality_job
FROM cleaned_sleep
GROUP BY occupation
HAVING COUNT(occupation) >= 5
ORDER BY avg_quality_job DESCQuestion 4: How do sleep duration and sleep quality vary based on BMI category?
-- Summary stats (average, median, standard deviation) for sleep duration and sleep quality split by BMI Category.
SELECT
bmi_category,
COUNT(*) AS total,
ROUND(AVG(sleep_duration),2) AS avg_duration_bmi,
ROUND(AVG(quality_sleep),2) AS avg_quality_bmi,
percentile_cont(0.5) WITHIN GROUP (ORDER BY sleep_duration) AS median_duration_bmi,
percentile_cont(0.5) WITHIN GROUP (ORDER BY quality_sleep) AS median_quality_bmi,
ROUND(STDDEV(sleep_duration),2) AS sd_duration_bmi,
ROUND(STDDEV(quality_sleep),2) AS sd_quality_bmi
FROM cleaned_sleep
GROUP BY bmi_category
ORDER BY avg_quality_bmi DESCQuestion 5: How do sleep duration and sleep quality correlate with age?
SELECT
corr(sleep_duration, age) AS duration_age_correlation,
corr(quality_sleep, age) AS quality_age_correlation,
MIN(age) AS youngest_age,
MAX(age) AS oldest_age,
AVG(age) AS avg_age,
STDDEV(age) AS sd_age
FROM cleaned_sleepQuestion 6: Do sleep duration and sleep quality correlate more highly with physical activity level or with stress level?