Skip to content

Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital. Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on June 1st, 1847, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.

You will reanalyze the data that made Semmelweis discover the importance of handwashing and its impact on the hospital and the number of deaths.

The data is stored as two CSV files within the data folder.

data/yearly_deaths_by_clinic.csv contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.

ColumnDescription
yearYears (1841-1846)
birthsNumber of births
deathsNumber of deaths
clinicClinic 1 or clinic 2

data/monthly_deaths.csv contains data from 'Clinic 1' of the hospital where most deaths occurred.

ColumnDescription
dateDate (YYYY-MM-DD)
birthsNumber of births
deathsNumber of deaths

Load the files

import pandas as pd

# Load the CSV data
yearly_df = pd.read_csv("data/yearly_deaths_by_clinic.csv")
monthly_df = pd.read_csv("data/monthly_deaths.csv")

Highest yearly proportion of deaths

# Imported libraries
import pandas as pd
import matplotlib.pyplot as plt


# Compute yearly proportion of deaths
yearly_df['proportion_deaths'] = yearly_df['deaths'] / yearly_df['births']

# Visualize proportions by clinic
for clinic, group in yearly_df.groupby('clinic'):
    plt.plot(group['year'], group['proportion_deaths'], label=f'Clinic {clinic}')
plt.xlabel('Year')
plt.ylabel('Proportion of deaths')
plt.title('Yearly Proportion of Deaths by Clinic')
plt.legend()
plt.show()

# Find year with highest proportion of deaths (overall, across both clinics)
highest_row = yearly_df.loc[yearly_df['proportion_deaths'].idxmax()]
highest_year = int(highest_row['year'])

Mean proportion of deaths before and after handwashing

monthly_df['date'] = pd.to_datetime(monthly_df['date'])

# Add proportion column
monthly_df['proportion_deaths'] = monthly_df['deaths'] / monthly_df['births']

# Add boolean flag for handwashing introduction date
handwashing_start_date = pd.to_datetime('1847-06-01')
monthly_df['handwashing_started'] = monthly_df['date'] >= handwashing_start_date

# Group by handwashing_started flag and compute mean proportion
monthly_summary = monthly_df.groupby('handwashing_started')['proportion_deaths'].mean().reset_index()
print(monthly_summary)

95% confidence interval for difference in means

# Cell 3: Analyze difference and compute 95% confidence interval

import scipy.stats as stats
import numpy as np

# Split data before and after handwashing
before = monthly_df[monthly_df['handwashing_started'] == False]['proportion_deaths']
after = monthly_df[monthly_df['handwashing_started'] == True]['proportion_deaths']

# Mean difference
mean_diff = after.mean() - before.mean()

# Standard error of the difference
pooled_se = np.sqrt(before.var(ddof=1)/before.shape[0] + after.var(ddof=1)/after.shape[0])

# 95% confidence interval (normal approx)
ci_low = mean_diff - 1.96 * pooled_se
ci_high = mean_diff + 1.96 * pooled_se

# Store as pandas Series
confidence_interval = pd.Series([ci_low, ci_high], index=['lower', 'upper'])
print(confidence_interval)

Conclusion

1️⃣ Yearly Data Observations

  • Clinic 1 recorded the highest proportion of deaths exceeding 0.14 in 1842.
  • Clinic 2 also peaked over 0.06 in 1842.
  • Both clinics showed a decline in mortality proportions through 1845, followed by a rise again in 1846.

Possible reasons and inferences:

  • The peak in 1842 may reflect poor hygiene practices, overcrowding, or seasonal outbreaks of puerperal fever.
  • The decline toward 1845 suggests gradual, informal improvements in hygiene awareness or hospital practices, even without a formal protocol.
  • The rise in 1846 could indicate inconsistent practices, staff changes, or the limits of passive measures without a standardized intervention.

2️⃣ Monthly Data on Handwashing Impact

  • Before handwashing (pre-June 1847): Mean proportion of deaths was ~0.105.
  • After handwashing began: Dropped sharply to ~0.021.
  • 95% confidence interval for the reduction in death proportion: [-0.101, -0.067], indicating a strong and statistically significant decrease.

Interpretation:

  • The introduction of handwashing was associated with a dramatic and robust reduction in maternal deaths.
  • The confidence interval being entirely negative indicates this reduction is very unlikely due to chance.
  • This supports Semmelweis’s theory that transmission of infection via unclean hands was a major cause of mortality.

3️⃣ Overall Conclusion

  • The data clearly show that maternal mortality was high and variable before antiseptic measures, with marked peaks (e.g., 1842).
  • The introduction of systematic handwashing produced a clear, large, and statistically significant reduction in deaths.
  • This historical evidence highlights the critical importance of consistent hygiene protocols in preventing hospital-acquired infections.