Skip to content

Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital. Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on June 1st, 1847, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.

You will reanalyze the data that made Semmelweis discover the importance of handwashing and its impact on the hospital.

The data is stored as two CSV files within the data folder.

yearly_deaths_by_clinic.csv contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.

ColumnDescription
yearYears (1841-1846)
birthsNumber of births
deathsNumber of deaths
clinicClinic 1 or clinic 2

monthly_deaths.csv contains data from 'Clinic 1' of the hospital where most deaths occurred.

ColumnDescription
dateDate (YYYY-MM-DD)
birthsNumber of births
deathsNumber of deaths

How much did handwashing reduce monthly death rates on average?

Load the CSV files into yearly and monthly data frames and check the data.

Loading Libraries and Data

suppressWarnings(suppressMessages({
	# Imported libraries
library(tidyverse)

# Loading the dataset
yearly <- read_csv('data/yearly_deaths_by_clinic.csv')
	
monthly <- read_csv('data/monthly_deaths.csv')	
}))

Manipulating Data

Add a proportion_deaths column to each df, calculating the proportion of deaths per number of births for each year in yearly and month in monthly.

suppressMessages(suppressWarnings({
	# Adding the proportion column
yearly <- read_csv('data/yearly_deaths_by_clinic.csv') %>% 
	mutate(proportion_deaths = deaths/births)
	
monthly <- read_csv('data/monthly_deaths.csv') %>% 
	mutate(proportion_deaths = deaths/births)
	}))

Visualisations

Create two ggplot line plots: one for the yearly proportion of deaths and another for the monthly proportion of deaths. For the yearly plot, create a different colored line for each clinic.

# Yearly trend of deaths as a proportion to births for the same year
yearly %>% 
ggplot(aes(year, proportion_deaths, color = clinic))+
geom_line()
# Monthly trend of deaths as a proportion to births for the same year
monthly %>% 
ggplot(aes(date, proportion_deaths))+
geom_line()

Add a handwashing_started boolean column to monthly using June 1st, 1847 as the threshold; TRUE should mean that handwashing has started at the clinic. Plot the new df with different colored lines depending on handwashing_started.

# Create new column
monthly$handwashing_started <- monthly$date >= "1847-06-01"

Plot the new df with different colored lines depending on handwashing_started.