Skip to content
0

SleepInc: Helping you find better sleep ๐Ÿ˜ด

๐Ÿ“– Background

Your client is SleepInc, a sleep health company that recently launched a sleep-tracking app called SleepScope. The app monitors sleep patterns and collects users' self-reported data on lifestyle habits. SleepInc wants to identify lifestyle, health, and demographic factors that strongly correlate with poor sleep quality. They need your help to produce visualizations and a summary of findings for their next board meeting! They need these to be easily digestible for a non-technical audience!

๐Ÿ’ช Challenge

Leverage this sleep data to analyze the relationship between lifestyle, health, demographic factors, and sleep quality. Your goal is to identify factors that correlate with poor sleep health.

Some examples:

  • Examine relationships between several factors like gender, occupation, physical activity, stress levels, and sleep quality/duration. Create visualizations to present your findings.
  • Produce recommendations on ways people can improve sleep health based on the patterns in the data.
  • Develop an accessible summary of study findings and recommendations for improving sleep health for non-technical audiences.

Summary and Recomendation

Summary of Process and Findings

We reccomend that in order to improve your quality of sleep, one should make sure they...

  • Maintain a healthy body weight
  • Achieve 8-8.5 hours of sleep a night
  • Have regular physical activity
  • Lower stress

Findings

In this notebook, we explore a dataset consisting of the information of the details pertaining to peoples sleep. We start by exploring the dataset and exploration the distrubtion of data. Following this, we start by examning the relationship between columns in the dataset in order to view how aspects of life go hand in hand with sleep. In these explorations we found...

  • Sleep Duration and Sleep Quality is positivly correlated
  • Mean, and median sleep quality varies by occupation
  • The occupations that recieve longer durations of sleep also enjoy better sleep quality
  • Younger people tend to have lower sleep quality and sleep quality apppears to increase with age, reaching its peak around age 53
  • Increased Physical Activity Level correlates to higher quality of sleep in most BMI Category groups
  • Stress and Heart Rate are negativly correlated to Quality of Sleep: The higher ones stress and heart rate, usually the worse ones quality of sleep
  • BMI Category group is negativly correlated with sleep quality: The higher ones BMI, usually the worse ones quality of sleep

Exploring Data

Taking a look at the dataset to get a better idea so we can approach the problem. We start by checking out the columns, na values, and outliers.

import pandas as pd
df = pd.read_csv('sleep_health_data.csv')
df
df.describe()
def na_analysis(df):
    """
    Examine NA values in a DataFrame.
    
    Arguments:
        df (DataFrame): A pandas DataFrame for analysis.
    
    Returns:
        pd.DataFrame: DataFrame containing information about the examination.
        """
    # counting na values
    na_count = df.isna().sum()
    
    # getting ratio of na values to rest of DataFrame
    na_ratio = na_count / len(df)
    
    # making the return DataFrame
    result = pd.DataFrame({
        'Column Name': na_count.index,
        'NA Value Counts' : na_count.values,
        'NA Value Ratio' : na_ratio.values.round(2)
    })
    
    return result.sort_values('NA Value Ratio', ascending = False)

na_analysis(df)
# finding outliers
def outliers(col):
    seventy_fifth = col.quantile(0.75)
    twenty_fifth = col.quantile(0.25)
    # calculate IQR
    iqr = seventy_fifth - twenty_fifth
    # upper and lower thresholds
    upper = seventy_fifth + (iqr*1.5)
    lower = twenty_fifth - (iqr*1.5)
    # filter for outliers
    outliers = ((col > upper) | (col < lower))
    return outliers

# number of outliers
outliers_count = df.select_dtypes(include=['float64','int64']).apply(outliers).sum()
# outliers_ratio = outliers_count.apply(lambda x: len(x)/len(df))
outliers_ratio = outliers_count/len(df)

# ratio of outliers as a precent
display(outliers_ratio * 100)

# keeping outliers for Heart Rate Column
outliers_heart_rate = outliers(df['Heart Rate'])
display(df[outliers_heart_rate])

Creating a categorical column for sleep quality

โ€Œ
โ€Œ
โ€Œ