Case Study: How Data Proved Handwashing Saves Lives
This is Dr. Ignaz Semmelweis, a Hungarian physician born in 1818 and active at the Vienna General Hospital. If Dr. Semmelweis looks troubled it's probably because he's thinking about childbed fever: A deadly disease affecting women that just have given birth. He is thinking about it because in the early 1840s at the Vienna General Hospital as many as 10% of the women giving birth die from it. He is thinking about it because he knows the cause of childbed fever: It's the contaminated hands of the doctors delivering the babies. And they won't listen to him and wash their hands!
In this notebook, I'm going to analyze the data that made Semmelweis discover the importance of handwashing. I am going to start by looking at the data that made Semmelweis realize that something was wrong with the procedures at Vienna General Hospital.
# I Loaded in the tidyverse package
# .... MY CODE FOR TASK 1 ....
library(tidyverse)
# I Read the dataset: yearly_deaths_by_clinic.csv into yearly
yearly <- read_csv("datasets/yearly_deaths_by_clinic.csv")
# I Printed out yearly to have a look at the dataset
print(yearly)
1. The alarming number of deaths
The table above shows the number of women giving birth at the two clinics at the Vienna General Hospital for the years 1841 to 1846. You'll notice that giving birth was very dangerous; an alarming number of women died as the result of childbirth, most of them from childbed fever.
I'm going to show this more clearly as I analyse the proportion of deaths out of the number of women giving birth.
# I added a new column to yearly with the proportion of deaths per number of births
yearly <- yearly %>%
mutate(proportion_deaths= deaths/births)
# Print out yearly
yearly
2. Death at the clinics
As I plot the proportion of deaths at both clinic 1 and clinic 2 we are going to see a curious pattern…
# I Set the size of plots in this notebook
options(repr.plot.width=7, repr.plot.height=4)
# Load the ggplot2 package
library(ggplot2)
# I plot yearly proportion of deaths at the two clinics
ggplot(yearly, aes(x = year, y = proportion_deaths, color = clinic, group = clinic)) +
geom_line() +
labs(title = "Proportion of Deaths by Year per Clinic",
x = "Year",
y = "Proportion of Deaths") +
theme_minimal()
3. The handwashing begins
Why is the proportion of deaths constantly so much higher in Clinic 1? Semmelweis saw the same pattern and was puzzled and distressed. The only difference between the clinics was that many medical students served at Clinic 1, while mostly midwife students served at Clinic 2. While the midwives only tended to the women giving birth, the medical students also spent time in the autopsy rooms examining corpses.
Semmelweis started to suspect that something on the corpses, spread from the hands of the medical students, caused childbed fever. So in a desperate attempt to stop the high mortality rates, he decreed: Wash your hands! This was an unorthodox and controversial request, nobody in Vienna knew about bacteria at this point in time.
I am going to load in monthly data from Clinic 1 to analyse to see if the handwashing had any effect.
# I read dataset: "monthly_deaths.csv" into monthly
monthly <- read_csv("datasets/monthly_deaths.csv")
# I added a new column with the proportion of deaths per number of the births
monthly$proportion_deaths <- monthly$deaths / monthly$births
# Printing out the first rows in monthly
head(monthly)
4. The effect of handwashing
With the data I loaded in, we can now look at the proportion of deaths over time. In the plot below, where obligatory handwashing started has not been outrightly marked, but it reduced the proportion of deaths to such a degree that you should be able to spot it!
# I plotted monthly proportion of deaths
ggplot(monthly, aes(x = date, y = proportion_deaths)) +
geom_line() +
labs(title = "Proportion of Deaths by Date per Clinic",
x = "Date",
y = "Proportion of Deaths") +
theme_minimal()
5. The effect of handwashing highlighted
Starting from the summer of 1847, the proportion of deaths was drastically reduced, and, yes, this was when Semmelweis made handwashing obligatory.
The effect of handwashing is made even clearer as I highlight this in the graph.
# Loaded necessary libraries
library(dplyr)
library(ggplot2)
# From this date, handwashing was made mandatory
handwashing_start = as.Date('1847-06-01')
# I added a TRUE/FALSE column to monthly data called handwashing_started
monthly <- monthly %>%
mutate(handwashing_started = date >= handwashing_start)
# I plotted monthly proportion of deaths before and after handwashing
ggplot(monthly, aes(x = date, y = proportion_deaths, color = handwashing_started)) +
geom_line() +
labs(title = "Proportion of Deaths by Date per Clinic",
x = "Date",
y = "Proportion of Deaths",
color = "Handwashing Started") +
theme_minimal()
6. More handwashing, fewer deaths?
Again, the previous graph shows that handwashing had a huge effect. Let's also analyze how much it reduced the monthly proportion of deaths on average.
# Calculating the mean proportion of deaths before and after handwashing.
monthly_summary <- monthly %>%
group_by(handwashing_started) %>%
summarise(mean_proportion_deaths = mean(proportion_deaths))
# Printing out the summary.
monthly_summary
7. A statistical analysis of the handwashing data
It reduced the proportion of deaths by around 8 percentage points! From 10% on average before handwashing to just 2% when handwashing was enforced (which is still a high number by modern standards). To get a sense of the uncertainty surrounding the reduction in mortality rates through handwashing, I will examine a confidence interval (here calculated using a t-test).