Skip to content

The importance of handwashing: A data analysis approach

Introduction

Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital. Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on June 1st, 1847, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.

You will reanalyze the data that made Semmelweis discover the importance of handwashing and its impact on the hospital.

Dataset

The data is stored as two CSV files within the data folder.

yearly_deaths_by_clinic.csv contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.

ColumnDescription
yearYears (1841-1846)
birthsNumber of births
deathsNumber of deaths
clinicClinic 1 or clinic 2

monthly_deaths.csv contains data from 'Clinic 1' of the hospital where most deaths occurred.

ColumnDescription
dateDate (YYYY-MM-DD)
birthsNumber of births
deathsNumber of deaths

Analysis

In this section we will statistically analyze the datasets to quantify the importance of handwashing in giving birth.

Importing data and loading packages

Proportion of deaths

Now, we'll calculate the proportion of deaths per number of births for each year and month, respectively.

# Adding a proportion of deaths column to each dataset
yearly$proportion_deaths = yearly$deaths/yearly$births
monthly$proportion_deaths = monthly$deaths/monthly$births

# Line plots
# yearly
ggplot(yearly, aes(x = year, y = proportion_deaths, color = clinic)) +
  geom_line() +
  ggtitle("Yearly proportion of deaths per number of births in both clinics")

# monthly
ggplot(monthly, aes(x = date, y = proportion_deaths)) +
  geom_line() +
  ggtitle("Monthly proportion of deaths per number of births in clinic 1")

Although the plots display decreasing trends, we can see a peak again around 1845-1846. Now let's see what happens when we consider the actual handwashing practice date (June 1st 1847).

# Adding a boolean column identifying whether it is part of the handwashing practice or not
monthly = monthly %>%
    mutate(handwashing_started = ifelse(date >= '1847-06-01', TRUE, FALSE))

# Updating the monthly plot
ggplot(monthly, aes(x = date, y = proportion_deaths, color = handwashing_started)) +
  geom_line() +
  ggtitle(label = "Monthly proportion of deaths per number of births in clinic 1", subtitle = "Identifying the beginning of handwashing practice")
# calculating the mean proportion of deaths before and after handwashing
monthly_summary = monthly %>%
	group_by(handwashing_started) %>%
	summarise(mean_proportion_deaths = mean(proportion_deaths))

monthly_summary

Now, we can see that the handwashing practice significantly helped to decrease the rate of deaths/births, and it's evident visually and numerically.