Skip to content
0

SleepInc: Helping you find better sleep 😴

πŸ“– Background

Your client is SleepInc, a sleep health company that recently launched a sleep-tracking app called SleepScope. The app monitors sleep patterns and collects users' self-reported data on lifestyle habits. SleepInc wants to identify lifestyle, health, and demographic factors that strongly correlate with poor sleep quality. They need your help to produce visualizations and a summary of findings for their next board meeting! They need these to be easily digestible for a non-technical audience!

πŸ’Ύ The data

SleepInc has provided you with an anonymized dataset of sleep and lifestyle metrics for 374 individuals. This dataset contains average values for each person calculated over the past six months.

The dataset includes 13 columns covering sleep duration, quality, disorders, exercise, stress, diet, demographics, and other factors related to sleep health.

ColumnDescription
Person IDAn identifier for each individual.
GenderThe gender of the person (Male/Female).
AgeThe age of the person in years.
OccupationThe occupation or profession of the person.
Sleep Duration (hours)The average number of hours the person sleeps per day.
Quality of Sleep (scale: 1-10)A subjective rating of the quality of sleep, ranging from 1 to 10.
Physical Activity Level (minutes/day)The average number of minutes the person engages in physical activity daily.
Stress Level (scale: 1-10)A subjective rating of the stress level experienced by the person, ranging from 1 to 10.
BMI CategoryThe BMI category of the person (e.g., Underweight, Normal, Overweight).
Blood Pressure (systolic/diastolic)The average blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.
Heart Rate (bpm)The average resting heart rate of the person in beats per minute.
Daily StepsThe average number of steps the person takes per day.
Sleep DisorderThe presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).

Acknowledgments: Laksika Tharmalingam, Kaggle: https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset (this is a fictitious dataset)

import pandas as pd
raw_data = pd.read_csv('sleep_health_data.csv')
raw_data

πŸ’ͺ Challenge

Leverage this sleep data to analyze the relationship between lifestyle, health, demographic factors, and sleep quality. Your goal is to identify factors that correlate with poor sleep health.

Some examples:

  • Examine relationships between several factors like gender, occupation, physical activity, stress levels, and sleep quality/duration. Create visualizations to present your findings.
  • Produce recommendations on ways people can improve sleep health based on the patterns in the data.
  • Develop an accessible summary of study findings and recommendations for improving sleep health for non-technical audiences.

πŸ§‘β€βš–οΈ Judging criteria

This competition is for helping to understand how competitions work. This competition will not be judged.

βœ… Checklist before publishing into the competition

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the judging criteria, so the workbook is focused on your story.
  • Make sure the workbook reads well and explains how you found your insights.
  • Try to include an executive summary of your recommendations at the beginning.
  • Check that all the cells run without error.

βŒ›οΈ Time is ticking. Good luck!

Hi, my name is Brian, and this is my workbook for the challenge. Although this competition is already over, there is still much to learn from this competition; hence, I'm attempting this challenge to brush up on my skills and learn more from the application.

With this challenge in mind, we have to set some goals and form a content plan to tackle the challenge.

First of all, our goal for this challenge is to

  1. Find out multiple relationships between data points, with certain factors such as Sleep Disorder, Stress Level, Quality of Sleep and Sleep Duration as the main outcomes to compare relationships to. This is not limited to them being outcomes, as they can be correlated with one another.
  2. Research and recommend ways that people can improve sleep health. This is done particularly based on the link between this dataset vs other datasets and online recommendations.

With these 2 goals in mind, there is the content for the challenge in this workbook 1.

TLDR

Skip this part if you are interested in the thinking process. This section is to highlight major insights discovered in this challenge that will be covered later in detail, but this section is to summarise findings under one picture to act as a "dashboard".

Brief Summary on the Content

Sleep disorders have been rampant in this day and age, especially as the economy has been facing a boom over the past decades. While not proven to be the main reason, many can face sleep disorders from a variety of sources, including work, stress or poor sleep hygiene like using phones before bed. To define a sleep disorder is a circumstance that disrupts the normal sleep cycle or pattern. While generic in nature, this is due to the nature of having many types of sleep disorders, such as common cases like insomnia, or extreme cases like sleepwalking.

Sleep disorder is believed to cause many other symptoms. Symptoms such as

  1. Motor overactivity
  2. Inattentiveness
  3. Irritability can be present when a person faces with sleep disorder, which will in turn disrupt important skills such as motivation or motor skills.

The current challenge is to understand sleep disorders with other contributing factors that might cause it, to find a correlation to sleep disorders and hopefully, make it better to reduce the chance or recurrence of sleep disorders.

Exploratory Data Analysis (EDA)

To analyse the data and extract information from the underlying data, we must first perform EDA to clean the data from potential errors that could lead to incorrect conclusions later. This can be in the form of missing data, non-standardised values, or values that are outside the scope, etc.

We will use SQL to quickly determine whether or not these errors are present and take appropriate steps to correct them. Do note that while spotting errors is one aspect of EDA, solving the errors might yield different results based on the solution. We could actually manually scroll to check for missing information, but to document a good step-by-step process for data cleaning in the cases of bigger datasets.

Firstly, we go over missing data. To check missing data, we can manually key in the following.

Spinner
DataFrameas
df1
variable
SELECT "Sleep Disorder" 
FROM 'sleep_health_data.csv'
WHERE "Sleep Disorder" IS NULL;

We do this for every single column, and luckily, none of the columns showed missing data. Moreover, the 'None' value shown in the "Sleep Disorder" column also showed that it is not considered as a missing data.

Secondly, we can check for duplicates. This is more so a step process for documentation rather than a necessary step, generally because the person ID column already tells us enough that none of them are duplicated.

Spinner
DataFrameas
df2
variable
SELECT COUNT(*), 
	"Person ID",
	"Occupation",
	"Heart Rate",
	"Daily Steps"
FROM 'sleep_health_data.csv'
GROUP BY "Person ID", "Occupation", "Heart Rate", "Daily Steps"
HAVING COUNT(*) > 1
β€Œ
β€Œ
β€Œ