Skip to content

Problem Statement

Sleep quality and duration is difficult to come by as individuals age due to a wide variety of circumstances. Using the Sleep, Health and Lifestyle Table, we can explore a variety of health, environmental and demographic factors that play different roles in affecting an individual's sleep.

This analysis will aim to pinpoint which factors play the largest roles in affecting a person's sleep quality and sleep duraration. We will analyze the correlation between sleep duration, sleep quality and the other numeric variables in this set. For the categorical variables, we will analyze summary statistics partitioned by those variables to see if any significant differences exist. Finally, with sleep disorders being a variable in the table, we will determine if the presence of a sleep disorder serves as a confounding variable in any correlations we examine, as well as the factors that are highly linked with the presence of a sleep disorder.

Spinner
DataFrameas
df
variable
-- Explore the data in the table
SELECT *
FROM 'Sleep_health_and_lifestyle_dataset.csv'
LIMIT 5

Cleaning the Data

Spinner
DataFrameas
cleaned_sleep
variable
-- Rename columns to fit SQL style
-- Merge 'Normal' and 'Normal Weight' under the BMI Category field

SELECT 
	"Person ID" AS person_id,
	Gender AS gender,
	Age AS age,
	Occupation AS occupation,
	"Sleep Duration" AS sleep_duration,
	"Quality of Sleep" AS quality_sleep,
	"Physical Activity Level" AS activity_level,
	"Stress Level" AS stress_level,
	REGEXP_REPLACE("BMI Category", ' Weight', '') AS bmi_category,
	"Sleep Disorder" AS sleep_disorder
FROM "Sleep_health_and_lifestyle_dataset.csv"

Question 1: How much do sleep duration and sleep quality correlate with each other?

Spinner
DataFrameas
df1
variable
Run cancelled
-- Correlation between sleep duration and sleep quality

SELECT corr(sleep_duration, quality_sleep) AS duration_quality_correlation
FROM cleaned_sleep

Question 2: How do sleep duration and sleep quality vary based on gender?

Spinner
DataFrameas
df2
variable
Run cancelled
-- Summary stats (average, median, standard deviation) for sleep duration and sleep quality split by gender

SELECT
	gender,
	COUNT(*) AS total,
	ROUND(AVG(sleep_duration),2) AS avg_duration_gender,
	ROUND(AVG(quality_sleep),2) AS avg_quality_gender,
	percentile_cont(0.5) WITHIN GROUP (ORDER BY sleep_duration) AS median_duration_gender,
	percentile_cont(0.5) WITHIN GROUP (ORDER BY quality_sleep) AS median_quality_gender,
	ROUND(STDDEV(sleep_duration),2) AS sd_duration_gender,
	ROUND(STDDEV(quality_sleep),2) AS sd_quality_gender
FROM cleaned_sleep
GROUP BY gender

Question 3: How do sleep duration and sleep quality vary based on occupation?

Spinner
DataFrameas
df3
variable
-- Summary stats (average, median, standard deviation) for sleep duration and sleep quality split by occupation.

-- Only occupations with at least 5 respondents were included. Those excluded were Scientist (4), Software Engineer (4), Sales Representative (2) and Manager (1).

SELECT
	occupation,
	COUNT(*) AS total,
	ROUND(AVG(age),2) AS avg_age,
	ROUND(AVG(sleep_duration),2) AS avg_duration_job,
	ROUND(AVG(quality_sleep),2) AS avg_quality_job,
	percentile_cont(0.5) WITHIN GROUP (ORDER BY sleep_duration) AS median_duration_job,
	percentile_cont(0.5) WITHIN GROUP (ORDER BY quality_sleep) AS median_quality_job,
	ROUND(STDDEV(sleep_duration),2) AS sd_duration_job,
	ROUND(STDDEV(quality_sleep),2) AS sd_quality_job
FROM cleaned_sleep
GROUP BY occupation
HAVING COUNT(occupation) >= 5
ORDER BY avg_quality_job DESC

Question 4: How do sleep duration and sleep quality vary based on BMI category?

Spinner
DataFrameas
df5
variable
Run cancelled
-- Summary stats (average, median, standard deviation) for sleep duration and sleep quality split by BMI Category.

SELECT
	bmi_category,
	COUNT(*) AS total,
	ROUND(AVG(sleep_duration),2) AS avg_duration_bmi,
	ROUND(AVG(quality_sleep),2) AS avg_quality_bmi,
	percentile_cont(0.5) WITHIN GROUP (ORDER BY sleep_duration) AS median_duration_bmi,
	percentile_cont(0.5) WITHIN GROUP (ORDER BY quality_sleep) AS median_quality_bmi,
	ROUND(STDDEV(sleep_duration),2) AS sd_duration_bmi,
	ROUND(STDDEV(quality_sleep),2) AS sd_quality_bmi
FROM cleaned_sleep
GROUP BY bmi_category
ORDER BY avg_quality_bmi DESC

Question 5: How do sleep duration and sleep quality correlate with age?

Spinner
DataFrameas
df4
variable
Run cancelled
SELECT
	corr(sleep_duration, age) AS duration_age_correlation,
	corr(quality_sleep, age) AS quality_age_correlation,
	MIN(age) AS youngest_age,
	MAX(age) AS oldest_age,
	AVG(age) AS avg_age,
	STDDEV(age) AS sd_age
FROM cleaned_sleep

Question 6: Do sleep duration and sleep quality correlate more highly with physical activity level or with stress level?