Snoring Analysis

In this project I aimed on brief analysis of my snoring during sleep at night. I have been tracking my snore percentage for couple of months with commercial mobile app. Finally I exported data and look at them here. Firstly I checked few diagnostics of data at hand. Then I cleaned the data. At the end I created some beautiful graphs revealing a trend.

#Importing all necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from datetime import datetime

#Importing dataset with proper data types
df = pd.read_csv("Snore_lab.csv", skiprows = 3,header = 0, delimiter = ",", index_col=False, parse_dates=["Start time", "Monitoring Start Time", "End Time"])

df.info()

#Creation of new columns to track times in seconds for statistical comparison
df["Time Snoring Sec"] = pd.to_timedelta(df["Time Snoring"]).dt.total_seconds()

df["Time Monitor Sec"] = pd.to_timedelta(df["Time Monitoring"]).dt.total_seconds()

#Descriptive statistics to identify possible outliers
print(df[["Time Monitor Sec", "Time Snoring Sec", "Snore Score", "Epic Snoring Percentage"]].describe())

#Visual identification of outliers
sns.set_palette("colorblind")
sns.set_style("white")
sns.pairplot(df, vars=["Time Monitor Sec", "Time Snoring Sec", "Snore Score", "Snoring Percentage", "Epic Snoring Percentage"], kind="reg", diag_kind="hist", corner=True)
plt.show()

#Visual identification of outliers with boxplot
sns.set_style("darkgrid")
fig, ax = plt.subplots()
sns.boxplot(data=df["Time Monitor Sec"])
ax.set_xticklabels("")
ax.set_ylabel("Sleep time monitoring (seconds)")
ax.set_title("Boxplot with outliers")
plt.show()

#Cleaned dataframe from outliers
df_clean = df[(df["Time Monitor Sec"] > 20000) & (df["Time Monitor Sec"] < 30000)]
df_clean.info()

#Exploratory data analysis of cleaned data
sns.set_palette("colorblind")
sns.set_style("white")
sns.pairplot(df_clean, vars=["Time Monitor Sec", "Time Snoring Sec", "Snore Score", "Snoring Percentage", "Epic Snoring Percentage"], kind="reg", diag_kind="hist", corner=True)
plt.show()

print(df_clean[["Snoring Percentage", "Loud Snoring Percentage", "Epic Snoring Percentage"]].describe())

# Stacked bar plot for individual values and severity of Snoring Percentage
df_sorted = df_clean[["Snoring Percentage", "Epic Snoring Percentage", "Loud Snoring Percentage", "Mild Snoring Percentage", "End Time"]].sort_values(by="End Time",ascending=True).reset_index(drop=True)

# Calculating mean of the sample
epic_mean = df_clean["Epic Snoring Percentage"].mean()
snoring_mean = df_clean["Snoring Percentage"].mean()

# Plotting data on stacked bar plot
sns.set_palette("colorblind")
sns.set_style("darkgrid")
fig, ax = plt.subplots(figsize=(10, 10))
ax.bar(df_sorted["End Time"].dt.strftime('%d-%m-%Y'), df_sorted["Epic Snoring Percentage"], color = "red", label = "Epic")
ax.bar(df_sorted["End Time"].dt.strftime('%d-%m-%Y'), df_sorted["Loud Snoring Percentage"], bottom = df_sorted["Epic Snoring Percentage"], color = "orange", label = "Loud")
ax.bar(df_sorted["End Time"].dt.strftime('%d-%m-%Y'), df_sorted["Mild Snoring Percentage"], bottom = df_sorted["Epic Snoring Percentage"] + df_sorted["Loud Snoring Percentage"], color = "green", label = "Mild")

# Adding horizontal lines with average values
ax.axhline(epic_mean, linestyle = "--", color = "blue", label = "Epic mean")
ax.axhline(snoring_mean, linestyle = "--", color = "black", label = "Snoring mean")

# Enhancing plot features
plt.xticks(rotation = 90)
ax.set_xlabel("Date")
ax.set_ylabel("Snoring (%)")
ax.set_title("Snoring percentage stacked by severity")
ax.legend()
plt.show()

# Stacked bar plot for individual values and severity of Snoring Percentage grouped by month in a year !!! It would be easier to use .resample() method for time series
epic_by_month = df_sorted.groupby(df_sorted['End Time'].dt.strftime('%Y-%m'))['Epic Snoring Percentage'].mean()
loud_by_month = df_sorted.groupby(df_sorted['End Time'].dt.strftime('%Y-%m'))['Loud Snoring Percentage'].mean()
mild_by_month = df_sorted.groupby(df_sorted['End Time'].dt.strftime('%Y-%m'))['Mild Snoring Percentage'].mean()

# Plotting data on stacked bar plot
sns.color_palette("icefire", as_cmap=True)
sns.set_palette("icefire")
sns.set_style("darkgrid")
fig, ax = plt.subplots(figsize=(9,9))
ax.bar(epic_by_month.index, epic_by_month, label="Epic")
ax.bar(loud_by_month.index, loud_by_month, bottom=epic_by_month, label="Loud")
ax.bar(mild_by_month.index, mild_by_month, bottom=epic_by_month + loud_by_month, label="Mild")

# Adding horizontal lines with average values
ax.axhline(epic_mean, linestyle = "--", color = "red", label = "Epic mean")
ax.axhline(snoring_mean, linestyle = "--", color = "blue", label = "Snoring mean")

# Enhancing plot features
plt.xticks(rotation=90)
ax.set_xlabel("Month")
ax.set_ylabel("Snoring (%)")
ax.set_title("Monthly average of snoring percentage stacked by severity")
plt.legend()
plt.show()

We can observe clean trend in my snoring percentage during night, which started increasing at the beginning of year 2023. I was snoring nearly one-third of whole night during April 2023. It might be linked with stress levels. Luckily it seems that peak is over and my snoring percentage is returning back to average. I will focus on my overall sleep quality in next projects. Stay tuned!

# Lets see whether there is some level of autocorrelation within snoring data and whether previous night snoring might predict tonight snoring.
from statsmodels.graphics.tsaplots import plot_acf

# Quick line plotting of time serie
ts = df_clean["Epic Snoring Percentage"]
ts.plot()
plt.show()

# Visual check of autocorrelation
plot_acf(ts, lags = 20, alpha = 0.05)
plt.show()

# Numeric check of autocorrelation of lag 5 measurements
print(ts.autocorr(5))

There is significant positive autocorrelation within 6th point of time serie (5 lag). The data set does not contain consecutive days, so interpolation would need to be done on missing measurement days. I will definitely focus on time series of snoring in the following project. Now I can conclude that two consecutive nights does not affect each other in terms of epic snoring percentage.