Sleep Health and Lifestyle
This synthetic dataset contains sleep and cardiovascular metrics as well as lifestyle factors of close to 400 fictive persons.
The workspace is set up with one CSV file, data.csv
, with the following columns:
Person ID
Gender
Age
Occupation
Sleep Duration
: Average number of hours of sleep per dayQuality of Sleep
: A subjective rating on a 1-10 scalePhysical Activity Level
: Average number of minutes the person engages in physical activity dailyStress Level
: A subjective rating on a 1-10 scaleBMI Category
Blood Pressure
: Indicated as systolic pressure over diastolic pressureHeart Rate
: In beats per minuteDaily Steps
Sleep Disorder
: One ofNone
,Insomnia
orSleep Apnea
Source: Kaggle
🌎 Questions Being Explored:
- Do people experience longer sleep with age based on their Gender?
- Do occurrances of sleep disorders rely on either Age or Gender?
2 hidden cells
Sleep Duration by Age for Men and Women
Do people experience longer sleep with age based on their Gender?
Showcase of ggplot2 and dplyr skill
sleep_age <- sleep_data %>%
select("Person ID","Gender","Age","Sleep Duration") %>%
rename(person_id = "Person ID", gender = "Gender", age = "Age", duration = "Sleep Duration")
sleep_age
library(ggplot2)
library(dplyr)
avg_male_age <- sleep_age %>%
filter(gender == "Male") %>%
summarise(avg_age = mean(age)) %>%
pull(avg_age)
avg_male_duration <- sleep_age %>%
filter(gender == "Male") %>%
summarise(avg_duration = mean(duration)) %>%
pull(avg_duration)
avg_female_age <- sleep_age %>%
filter(gender == "Female") %>%
summarise(avg_age = mean(age)) %>%
pull(avg_age)
avg_female_duration <- sleep_age %>%
filter(gender == "Female") %>%
summarise(avg_duration = mean(duration)) %>%
pull(avg_duration)
sleep_age_graph <- sleep_age %>%
ggplot(aes(x = age, y = duration)) +
labs(y = "Sleep Duration", x = "Age", linetype = "Trend Lines")
sleep_age_graph +
geom_point(size = 4,aes(shape = gender,color = gender), alpha = 0.9) +
scale_shape_manual(values = c(19, 17), name = "Gender",
labels = c("Female", "Male")) +
scale_color_manual(values = c("red", "navy"), name = "Gender",
labels = c("Female", "Male")) +
geom_smooth(method = "lm", se = FALSE, color = "black", aes(linetype = gender))
Conclusions
In general, as people age, they tend to sleep more, but this change is seen much greater in Female responses. However, it should also be noted that there were significantly more older females in the sample. In addition, there were a greater number of younger Male responses.
This leads me to believe that this data may not necessarily be reliable for this specific question. This discrepancy is explored in the graph below:
sleep_age_graph +
geom_point(size = 2, color = "gray", aes(shape=gender)) +
geom_smooth(method = "lm", se = FALSE, color = "gray", aes(linetype = gender)) +
geom_point(aes(y = avg_male_duration, x = avg_male_age), color = "navy", size = 5, shape = 2) +
annotate(geom = "text", label = "Avg Male Response", y = avg_male_duration + 0.1, x = avg_male_age, color = "navy", size = 5) +
geom_point(aes(y = avg_female_duration, x = avg_female_age), color = "red", size = 5, shape = 1) +
annotate(geom = "text", label = "Avg Female Response", y = avg_female_duration + 0.1, x = avg_female_age, color = "red", size = 5) +
theme(legend.position = "none")
The above graph highlights this difference in the average age of respondants. The average male response is roughly 10 years younger than that of female responses, indicating that the dataset is not necessarily reliable. In order to find the correct answer to this question, my suggestion is to record more older male responses, and more younger female responses.
An alternative option is below:
sleep_age$age <- as.numeric(as.character(sleep_age$age))
sleep_age %>%
slice_min(age)%>%
head(n=1)
sleep_age %>%
slice_max(age)%>%
head(n=1)
sleep_age %>%
filter(gender == "Female") %>%
slice_min(age) %>%
head(n=1)
sleep_age %>%
filter(gender == "Male") %>%
slice_max(age)%>%
head(n=1)
sleep_age_strat <- sleep_age %>%
mutate(age_group = cut(age, breaks = c(25, 30, 35, 40, 45, 50, 55), labels = FALSE))
sleep_age_strat <- sleep_age_strat %>%
select(gender,age_group,duration) %>%
group_by(age_group,gender) %>%
summarize(avg_dur = mean(duration))
sleep_age_strat
sleep_age_graph_strat <- sleep_age_strat %>%
ggplot(aes(y = avg_dur, x = age_group, shape = gender)) +
labs(y = "Sleep Duration", x = "Age Group", linetype = "Trend Lines")
sleep_age_graph_strat +
geom_point(size = 4,aes(shape = gender,color = gender), alpha = 0.9) +
scale_shape_manual(values = c(19, 17), name = "Gender",
labels = c("Female", "Male")) +
scale_color_manual(values = c("red", "navy"), name = "Gender",
labels = c("Female", "Male")) +
geom_smooth(method = "lm", se = FALSE, color = "black", aes(linetype = gender))
The above graph and data transformation groups the ages into buckets, seperated every five years. The average duration of sleep for each bucket is then calculated, and plotted on the graph. This indicates that male and female sleep duration increases at relatively the same rate, however male responses record a longer duration of sleep in general, regardless of age group.
The limitations of this approach are that some of the groups have very few or no responses in them depending on gender, especially as we move to the outer groups.
Incidence of Sleep Disorder by Age and Gender
Do occurrences of sleep disorders rely on either Age or Gender?
Showcase of dplyr and ggplot2 skill
disorder_by_gender <- sleep_data %>%
select(Gender,'Sleep Disorder',Age) %>%
rename(gender = Gender, disorder = 'Sleep Disorder') %>%
group_by(gender,disorder) %>%
count(gender) %>%
rename(count = n)
dis_by_gender_pct <- disorder_by_gender %>%
group_by(gender) %>%
mutate(total_count = sum(count)) %>%
mutate(pct_by_gender = round(count / total_count *100,0)) %>%
select(-total_count)
dis_by_gender_pct_m <- dis_by_gender_pct %>%
select(gender,disorder,pct_by_gender) %>%
filter(gender == "Male")
dis_by_gender_pct_m
dis_by_gender_pct_f <- dis_by_gender_pct %>%
select(gender,disorder,pct_by_gender) %>%
filter(gender == "Female")
dis_by_gender_pct_f
‌
‌