Skip to content

Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital. Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on June 1st, 1847, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.

You will reanalyze the data that made Semmelweis discover the importance of handwashing and its impact on the hospital.

The data is stored as two CSV files within the data folder.

yearly_deaths_by_clinic.csv contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.

ColumnDescription
yearYears (1841-1846)
birthsNumber of births
deathsNumber of deaths
clinicClinic 1 or clinic 2

monthly_deaths.csv contains data from 'Clinic 1' of the hospital where most deaths occurred.

ColumnDescription
dateDate (YYYY-MM-DD)
birthsNumber of births
deathsNumber of deaths
# Imported libraries
library(tidyverse, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)
library(ggplot2, warn.conflicts = FALSE)
# set working directory
setwd("/work/files/workspace/data")
# load datasets
yearly_deaths_by_clinic <- fread("yearly_deaths_by_clinic.csv", stringsAsFactors = TRUE)
head(yearly_deaths_by_clinic)

monthly_deaths <- fread("monthly_deaths.csv")
head(monthly_deaths)
# Data prep
## check structure of data 
str(yearly_deaths_by_clinic)
str(monthly_deaths)

## check dimension of data
dim(yearly_deaths_by_clinic)
dim(monthly_deaths)
# Data cleaning 
## check for missing data 
sum(is.na(yearly_deaths_by_clinic))
sum(is.na(monthly_deaths))

# check for duplicates
sum(duplicated(yearly_deaths_by_clinic))
sum(duplicated(monthly_deaths))
# Add proportions_death column to yearly_deaths_by_clinic using mutate function 
## Calculate proportion of deaths each year using yearly_deaths_by_clinic
yearly <- yearly_deaths_by_clinic %>% mutate(proportion_deaths = deaths/ births)

head(yearly)
# Visualize porportional_deaths by year, group by clinic
ggplot(yearly, aes(x = year, y = proportion_deaths, group = clinic, color = clinic)) + geom_line() + ylab("Proportion of deaths") + xlab("Year") + ggtitle("Comparison of proportion of death by year between Clinic1 and Clinic2")

**Observation: **

  1. Proportion of deaths in clinic 1 is higher than proportion of deaths in clinic 2.
  2. Decreasing trend in proportional of death observed from 1842 to 1845.
  3. Proportion of death shows increase from 1845 to 1846.

Given that, Dr Semmelweis decreed on June 1st 1847, that everyone should wash their hands. Visualize the impact of handwashing in clinic 2, in which we have the mm/dd/yyyy values

First calculate the proportion of death in clinic 2 using the monthly_deaths table

# Add proportions_death column to monthly_deaths using mutate function 
## Calculate proportion of deaths each month using monthly
monthly <- monthly_deaths %>% mutate(proportions_deaths = deaths/ births)

Filter the dates after June 1st 1847, and use it to visualize the impact of handwashing on the proportions of death

# Make 1847-07-01 into date formate
start <- as.Date("1847-07-01")

# Add a new column to monthly names wash_started.
monthly <- monthly %>% 
			mutate(wash_started = date >= start)