Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital. Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on June 1st, 1847, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.
You will reanalyze the data that made Semmelweis discover the importance of handwashing and its impact on the hospital and the number of deaths.
The data is stored as two CSV files within the data folder.
data/yearly_deaths_by_clinic.csv contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.
| Column | Description |
|---|---|
year | Years (1841-1846) |
births | Number of births |
deaths | Number of deaths |
clinic | Clinic 1 or clinic 2 |
data/monthly_deaths.csv contains data from 'Clinic 1' of the hospital where most deaths occurred.
| Column | Description |
|---|---|
date | Date (YYYY-MM-DD) |
births | Number of births |
deaths | Number of deaths |
# Imported libraries
import pandas as pd
import matplotlib.pyplot as pltimport numpy as np
import scipy.stats as stats
# Load data
yearly_data = pd.read_csv('data/yearly_deaths_by_clinic.csv')
monthly_data = pd.read_csv('data/monthly_deaths.csv')
# Calculate yearly proportion of deaths
yearly_data['death_rate'] = yearly_data['deaths'] / yearly_data['births']
# Find the year with the highest death rate for each clinic
highest_year = int(yearly_data.loc[yearly_data['death_rate'].idxmax(), 'year'])
highest_year# Convert date column to datetime in monthly data
monthly_data['date'] = pd.to_datetime(monthly_data['date'])
# Identify handwashing period
handwashing_start = pd.Timestamp("1847-06-01")
monthly_data['handwashing_started'] = monthly_data['date'] >= handwashing_start
# Compute mean proportion of deaths before and after handwashing
monthly_data['death_rate'] = monthly_data['deaths'] / monthly_data['births']
monthly_summary = monthly_data.groupby('handwashing_started')['death_rate'].mean().reset_index()
monthly_summary# Bootstrap analysis for confidence interval
before = monthly_data[monthly_data['handwashing_started'] == False]['death_rate']
after = monthly_data[monthly_data['handwashing_started'] == True]['death_rate']
boot_mean_diff = []
for i in range(3000):
boot_before = before.sample(frac=1, replace=True)
boot_after = after.sample(frac=1, replace=True)
boot_mean_diff.append(boot_after.mean() - boot_before.mean())
# Calculating a 95% confidence interval from boot_mean_diff
confidence_interval = pd.Series(boot_mean_diff).quantile([0.025, 0.975])
confidence_interval# Visualize yearly death rates
plt.figure(figsize=(10, 5))
for clinic in yearly_data['clinic'].unique():
subset = yearly_data[yearly_data['clinic'] == clinic]
plt.plot(subset['year'], subset['death_rate'], marker='o', label=clinic)
plt.xlabel("Year")
plt.ylabel("Death Rate")
plt.title("Yearly Death Rate by Clinic")
plt.legend()
plt.show()
# Visualize monthly death rate before and after handwashing
plt.figure(figsize=(10, 5))
plt.plot(monthly_data['date'], monthly_data['death_rate'], marker='o', linestyle='-', label='Death Rate')
plt.axvline(handwashing_start, color='red', linestyle='--', label='Handwashing Introduced')
plt.xlabel("Date")
plt.ylabel("Death Rate")
plt.title("Monthly Death Rate Before and After Handwashing")
plt.legend()
plt.show()
# Split the data into pre- and post-handwashing periods
pre_handwashing = monthly_deaths[monthly_deaths['date'] < handwashing_date]
post_handwashing = monthly_deaths[monthly_deaths['date'] >= handwashing_date]
# Calculate average death rates before and after handwashing
avg_death_rate_pre = pre_handwashing['death_rate'].mean()
avg_death_rate_post = post_handwashing['death_rate'].mean()
# Bar plot of average death rates
plt.figure(figsize=(8, 6))
plt.bar(['Pre-Handwashing', 'Post-Handwashing'], [avg_death_rate_pre, avg_death_rate_post], color=['red', 'green'])
plt.title('Average Death Rates Before and After Handwashing')
plt.ylabel('Death Rate')
plt.show()Conclusion
Summary of findings:
Clinic 1 had a higher death rate than Clinic 2 before handwashing.
After the introduction of handwashing on June 1, 1847, the death rate in Clinic 1 dropped significantly.
The statistical analysis confirms that the reduction in death rates is significant.