Skip to content
Snoring Analysis
  • AI Chat
  • Code
  • Report
  • Spinner

    In this project I aimed on brief analysis of my snoring during sleep at night. I have been tracking my snore percentage for couple of months with commercial mobile app. Finally I exported data and look at them here. Firstly I checked few diagnostics of data at hand. Then I cleaned the data. At the end I created some beautiful graphs revealing a trend.

    #Importing all necessary libraries
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    import numpy as np
    from datetime import datetime
    #Importing dataset with proper data types
    df = pd.read_csv("Snore_lab.csv", skiprows = 3,header = 0, delimiter = ",", index_col=False, parse_dates=["Start time", "Monitoring Start Time", "End Time"])
    
    df.info()
    #Creation of new columns to track times in seconds for statistical comparison
    df["Time Snoring Sec"] = pd.to_timedelta(df["Time Snoring"]).dt.total_seconds()
    
    df["Time Monitor Sec"] = pd.to_timedelta(df["Time Monitoring"]).dt.total_seconds()
    #Descriptive statistics to identify possible outliers
    print(df[["Time Monitor Sec", "Time Snoring Sec", "Snore Score", "Epic Snoring Percentage"]].describe())
    #Visual identification of outliers
    sns.set_palette("colorblind")
    sns.set_style("white")
    sns.pairplot(df, vars=["Time Monitor Sec", "Time Snoring Sec", "Snore Score", "Snoring Percentage", "Epic Snoring Percentage"], kind="reg", diag_kind="hist", corner=True)
    plt.show()
    #Visual identification of outliers with boxplot
    sns.set_style("darkgrid")
    fig, ax = plt.subplots()
    sns.boxplot(data=df["Time Monitor Sec"])
    ax.set_xticklabels("")
    ax.set_ylabel("Sleep time monitoring (seconds)")
    ax.set_title("Boxplot with outliers")
    plt.show()
    #Cleaned dataframe from outliers
    df_clean = df[(df["Time Monitor Sec"] > 20000) & (df["Time Monitor Sec"] < 30000)]
    df_clean.info()
    
    #Exploratory data analysis of cleaned data
    sns.set_palette("colorblind")
    sns.set_style("white")
    sns.pairplot(df_clean, vars=["Time Monitor Sec", "Time Snoring Sec", "Snore Score", "Snoring Percentage", "Epic Snoring Percentage"], kind="reg", diag_kind="hist", corner=True)
    plt.show()
    print(df_clean[["Snoring Percentage", "Loud Snoring Percentage", "Epic Snoring Percentage"]].describe())
    
    # Stacked bar plot for individual values and severity of Snoring Percentage
    df_sorted = df_clean[["Snoring Percentage", "Epic Snoring Percentage", "Loud Snoring Percentage", "Mild Snoring Percentage", "End Time"]].sort_values(by="End Time",ascending=True).reset_index(drop=True)
    
    # Calculating mean of the sample
    epic_mean = df_clean["Epic Snoring Percentage"].mean()
    snoring_mean = df_clean["Snoring Percentage"].mean()
    
    # Plotting data on stacked bar plot
    sns.set_palette("colorblind")
    sns.set_style("darkgrid")
    fig, ax = plt.subplots(figsize=(10, 10))
    ax.bar(df_sorted["End Time"].dt.strftime('%d-%m-%Y'), df_sorted["Epic Snoring Percentage"], color = "red", label = "Epic")
    ax.bar(df_sorted["End Time"].dt.strftime('%d-%m-%Y'), df_sorted["Loud Snoring Percentage"], bottom = df_sorted["Epic Snoring Percentage"], color = "orange", label = "Loud")
    ax.bar(df_sorted["End Time"].dt.strftime('%d-%m-%Y'), df_sorted["Mild Snoring Percentage"], bottom = df_sorted["Epic Snoring Percentage"] + df_sorted["Loud Snoring Percentage"], color = "green", label = "Mild")
    
    # Adding horizontal lines with average values
    ax.axhline(epic_mean, linestyle = "--", color = "blue", label = "Epic mean")
    ax.axhline(snoring_mean, linestyle = "--", color = "black", label = "Snoring mean")
    
    # Enhancing plot features
    plt.xticks(rotation = 90)
    ax.set_xlabel("Date")
    ax.set_ylabel("Snoring (%)")
    ax.set_title("Snoring percentage stacked by severity")
    ax.legend()
    plt.show()
    # Stacked bar plot for individual values and severity of Snoring Percentage grouped by month in a year !!! It would be easier to use .resample() method for time series
    epic_by_month = df_sorted.groupby(df_sorted['End Time'].dt.strftime('%Y-%m'))['Epic Snoring Percentage'].mean()
    loud_by_month = df_sorted.groupby(df_sorted['End Time'].dt.strftime('%Y-%m'))['Loud Snoring Percentage'].mean()
    mild_by_month = df_sorted.groupby(df_sorted['End Time'].dt.strftime('%Y-%m'))['Mild Snoring Percentage'].mean()
    
    # Plotting data on stacked bar plot
    sns.color_palette("icefire", as_cmap=True)
    sns.set_palette("icefire")
    sns.set_style("darkgrid")
    fig, ax = plt.subplots(figsize=(9,9))
    ax.bar(epic_by_month.index, epic_by_month, label="Epic")
    ax.bar(loud_by_month.index, loud_by_month, bottom=epic_by_month, label="Loud")
    ax.bar(mild_by_month.index, mild_by_month, bottom=epic_by_month + loud_by_month, label="Mild")
    
    # Adding horizontal lines with average values
    ax.axhline(epic_mean, linestyle = "--", color = "red", label = "Epic mean")
    ax.axhline(snoring_mean, linestyle = "--", color = "blue", label = "Snoring mean")
    
    # Enhancing plot features
    plt.xticks(rotation=90)
    ax.set_xlabel("Month")
    ax.set_ylabel("Snoring (%)")
    ax.set_title("Monthly average of snoring percentage stacked by severity")
    plt.legend()
    plt.show()
    

    We can observe clean trend in my snoring percentage during night, which started increasing at the beginning of year 2023. I was snoring nearly one-third of whole night during April 2023. It might be linked with stress levels. Luckily it seems that peak is over and my snoring percentage is returning back to average. I will focus on my overall sleep quality in next projects. Stay tuned!

    # Lets see whether there is some level of autocorrelation within snoring data and whether previous night snoring might predict tonight snoring.
    from statsmodels.graphics.tsaplots import plot_acf
    
    # Quick line plotting of time serie
    ts = df_clean["Epic Snoring Percentage"]
    ts.plot()
    plt.show()
    
    # Visual check of autocorrelation
    plot_acf(ts, lags = 20, alpha = 0.05)
    plt.show()
    
    # Numeric check of autocorrelation of lag 5 measurements
    print(ts.autocorr(5))

    There is significant positive autocorrelation within 6th point of time serie (5 lag). The data set does not contain consecutive days, so interpolation would need to be done on missing measurement days. I will definitely focus on time series of snoring in the following project. Now I can conclude that two consecutive nights does not affect each other in terms of epic snoring percentage.