Project: Dr. Semmelweis and the Importance of Handwashing

Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital. Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on June 1st, 1847, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.

You will reanalyze the data that made Semmelweis discover the importance of handwashing and its impact on the hospital and the number of deaths.

The data is stored as two CSV files within the data folder.

data/yearly_deaths_by_clinic.csv contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.

Column	Description
`year`	Years (1841-1846)
`births`	Number of births
`deaths`	Number of deaths
`clinic`	Clinic 1 or clinic 2

data/monthly_deaths.csv contains data from 'Clinic 1' of the hospital where most deaths occurred.

Column	Description
`date`	Date (YYYY-MM-DD)
`births`	Number of births
`deaths`	Number of deaths

# Imported libraries
import pandas as pd
import matplotlib.pyplot as plt

yearly = pd.read_csv("data/yearly_deaths_by_clinic.csv")

yearly["prop"] = round(yearly["deaths"] / yearly["births"], 2)

# Year with the highest yearly proportion of deaths at each clinic
highest_year = int(yearly.loc[yearly["prop"].idxmax(), "year"])

# Plot Area
plt.figure(figsize=(12, 6))
for clinic in yearly["clinic"].unique():
    clinic_data = yearly[yearly["clinic"] == clinic]
    plt.plot(clinic_data["year"], clinic_data["prop"], marker='o', label=clinic)

plt.title("Yearly Proportion of Deaths by Clinic")
plt.xlabel("Year")
plt.ylabel("Proportion of Deaths")
plt.legend()
plt.grid(False)
plt.show()

print("highest_year:", highest_year)

# Imported libraries
import pandas as pd
import matplotlib.pyplot as plt

monthly = pd.read_csv("data/monthly_deaths.csv")

# Calculate mean proportions of deaths before and after handwashing
monthly["date"] = pd.to_datetime(monthly["date"])
monthly["prop"] = monthly["deaths"] / monthly["births"]

before_handwashing = monthly[monthly["date"] < "1847-06-01"]["prop"].mean().round(2)
after_handwashing = monthly[monthly["date"] >= "1847-06-01"]["prop"].mean().round(2)

# Create summary DataFrame
monthly_summary = pd.DataFrame({
    "handwashing_started": [False, True],
    "mean_prop_deaths": [before_handwashing, after_handwashing]
})

monthly_summary

import numpy as np
import scipy.stats as stats

# Separate the data
before_handwashing = monthly[monthly["date"] < "1847-06-01"]["prop"]
after_handwashing = monthly[monthly["date"] >= "1847-06-01"]["prop"]

# Calculate the mean difference
mean_diff = after_handwashing.mean() - before_handwashing.mean()

# Calculate the standard error of the difference in means
se_diff = np.sqrt(before_handwashing.var()/len(before_handwashing) + after_handwashing.var()/len(after_handwashing))

# Calculate the 95% confidence interval
confidence_interval = stats.t.interval(0.95, df=len(before_handwashing) + len(after_handwashing) - 2, loc=mean_diff, scale=se_diff)

# Store the result in a pandas series
confidence_interval = pd.Series(confidence_interval, index=["lower_bound", "upper_bound"])

print(confidence_interval)