Skip to content
Sleep Health and Lifestyle
  • AI Chat
  • Code
  • Report
  • Sleep Health and Lifestyle

    This synthetic dataset contains sleep and cardiovascular metrics as well as lifestyle factors of close to 400 fictive persons.

    The workspace is set up with one CSV file, data.csv, with the following columns:

    • Person ID
    • Gender
    • Age
    • Occupation
    • Sleep Duration: Average number of hours of sleep per day
    • Quality of Sleep: A subjective rating on a 1-10 scale
    • Physical Activity Level: Average number of minutes the person engages in physical activity daily
    • Stress Level: A subjective rating on a 1-10 scale
    • BMI Category
    • Blood Pressure: Indicated as systolic pressure over diastolic pressure
    • Heart Rate: In beats per minute
    • Daily Steps
    • Sleep Disorder: One of None, Insomnia or Sleep Apnea

    Check out the guiding questions or the scenario described below to get started with this dataset! Feel free to make this workspace yours by adding and removing cells, or editing any of the existing cells.

    Source: Kaggle

    🌎 Some guiding questions to help you explore this data:

    1. Which factors could contribute to a sleep disorder?
    2. Does an increased physical activity level result in a better quality of sleep?
    3. Does the presence of a sleep disorder affect the subjective sleep quality metric?

    📊 Visualization ideas

    • Boxplot: show the distribution of sleep duration or quality of sleep for each occupation.
    • Show the link between age and sleep duration with a scatterplot. Consider including information on the sleep disorder.

    🔍 Scenario: Automatically identify potential sleep disorders

    This scenario helps you develop an end-to-end project for your portfolio.

    Background: You work for a health insurance company and are tasked to identify whether or not a potential client is likely to have a sleep disorder. The company wants to use this information to determine the premium they want the client to pay.

    Objective: Construct a classifier to predict the presence of a sleep disorder based on the other columns in the dataset.

    Check out our Linear Classifiers course (Python) or Supervised Learning course (R) for a quick introduction to building classifiers.

    import pandas as pd
    
    sleep_data = pd.read_csv('data.csv')
    sleep_data.head()

    EDA

    sleep_data.info()
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    def categorical_summary(df):
        # Select columns with dtype 'object' or 'category'
        cat_cols = df.select_dtypes(include=['object', 'category']).columns
        
        # Initialize an empty DataFrame to store the summary
        summary_df = pd.DataFrame(columns=['Unique_Values', 'Value_Counts'])
        
        for col in cat_cols:
            # Count unique values
            unique_count = len(df[col].unique())
            
            # Get value counts as a string
            value_counts_str = df[col].value_counts().to_string()
            
            # Append to the summary DataFrame
            summary_df.loc[col] = [unique_count, value_counts_str]
            
        return summary_df
    categorical_summary(sleep_data)
    # Combine 'Normal' and 'Normal Weight'
    sleep_data['BMI Category'].replace('Normal Weight', 'Normal', inplace=True)
    sleep_data['BMI Category'].value_counts()
    sleep_data.describe()

    Exploring the Relationship Between Occupation and Sleep Patterns: A Comparative Analysis of Sleep Duration and Quality

    The following visualizations serve as a foundation for understanding the complex relationship between occupation and sleep health. They highlight potential areas of concern where occupational demands might negatively impact sleep, and conversely, where certain job types might correlate with better sleep health. This analysis is crucial for developing targeted interventions or recommendations for improving sleep health in specific professional groups.

    1. Sleep Duration by Occupation:

    • Range and Median: Different occupations show varied ranges in sleep duration. For instance, if certain occupations like 'Software Engineers' or 'Doctors' have a wider range, it might indicate a more inconsistent sleep pattern within these professions. The median line within each box indicates the typical sleep duration for each occupation, which can be compared across different jobs.
    • Outliers: If there are occupations with notable outliers (points that lie outside the typical range of the boxplot), this could suggest that individuals in these professions experience either significantly more or less sleep than their peers, possibly due to job-related stress or irregular work hours.
    • Tight vs. Wide Distributions: Some occupations might have tighter distributions (smaller boxes), suggesting more uniformity in sleep duration among individuals in these jobs. Others with wider distributions indicate more variability among workers in that field. 2. Quality of Sleep by Occupation:
    • Highs and Lows: Pay attention to occupations that have higher median sleep quality scores versus those with lower scores. This can suggest which professions are associated with better perceived sleep quality.
    • Consistency: Occupations with boxes that have less spread (narrower boxes) indicate a more consistent sleep quality experience among individuals in that profession, whereas wider boxes show greater variance.
    • Skewness: The skewness in the distribution (whether the box and whiskers are symmetric or skewed to one side) can indicate a tendency towards higher or lower sleep quality ratings in certain occupations.
    Hidden code

    Analyzing the Intersection of Age, Sleep Duration, and Sleep Disorders: A Visual Exploration

    The scatterplot visualizes the relationship between sleep duration, age, and the presence of sleep disorders, offering several valuable insights:

    1. Age and Sleep Duration Correlation: The plot provides a visual representation of how sleep duration varies with age. Look for any trends such as increasing or decreasing sleep duration as age increases. If a trend is visible, it suggests a potential age-related change in sleep patterns. 2. Impact of Sleep Disorders: The use of different colors to represent different sleep disorders (None, Insomnia, Sleep Apnea) allows us to observe how these conditions are distributed across different age groups and how they might affect sleep duration. Key insights could include:

    • Whether certain sleep disorders are more prevalent in specific age groups.
    • How sleep duration varies among individuals with different sleep disorders. For instance, do those with insomnia tend to have shorter sleep durations? 3. Cluster and Spread: The distribution and clustering of points can indicate common patterns. A dense clustering of points in a certain area suggests a common sleep duration for a specific age group, while a more spread-out distribution indicates variability. 4. Outliers and Anomalies: Any points that lie far from the general cluster can indicate unusual cases, such as very young or old individuals with atypical sleep durations or specific sleep disorder patterns not common in their age group.

    This scatterplot effectively combines three critical dimensions of the dataset—age, sleep duration, and sleep disorders—to provide a comprehensive view of how these factors interrelate. Such insights are invaluable for understanding the dynamics of sleep health across different age groups and for identifying specific age groups that might be more susceptible to certain sleep disorders.

    Hidden code