Skip to content

Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital. Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on June 1st, 1847, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.

You will reanalyze the data that made Semmelweis discover the importance of handwashing and its impact on the hospital.

The data is stored as two CSV files within the data folder.

yearly_deaths_by_clinic.csv contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.

ColumnDescription
yearYears (1841-1846)
birthsNumber of births
deathsNumber of deaths
clinicClinic 1 or clinic 2

monthly_deaths.csv contains data from 'Clinic 1' of the hospital where most deaths occurred.

ColumnDescription
dateDate (YYYY-MM-DD)
birthsNumber of births
deathsNumber of deaths

Step 1: Load the CSV files and check the data First, we need to load the necessary libraries and the data from the CSV files.

# Load necessary libraries
library(tidyverse)

# Load the data
yearly <- read_csv("data/yearly_deaths_by_clinic.csv")
monthly <- read_csv("data/monthly_deaths.csv")

# Check the data
head(yearly)
head(monthly)

Step 2: Add a proportion_deaths column We will add a proportion_deaths column to both data frames, which will represent the proportion of deaths relative to the number of births.

# Add proportion_deaths to yearly data
yearly <- yearly %>%
  mutate(proportion_deaths = deaths / births)

# Add proportion_deaths to monthly data
monthly <- monthly %>%
  mutate(proportion_deaths = deaths / births)

# Check the updated data frames
head(yearly)
head(monthly)

Step 3: Create ggplot line plots Next, we will create two line plots: one for the yearly proportion of deaths and another for the monthly proportion of deaths.

# Yearly proportion of deaths plot
yearly_plot <- ggplot(yearly, aes(x = year, y = proportion_deaths, color = clinic)) +
  geom_line() +
  labs(title = "Yearly Proportion of Deaths by Clinic",
       x = "Year",
       y = "Proportion of Deaths",
       color = "Clinic") +
  theme_minimal()

# Monthly proportion of deaths plot
monthly_plot <- ggplot(monthly, aes(x = as.Date(date), y = proportion_deaths)) +
  geom_line() +
  labs(title = "Monthly Proportion of Deaths in Clinic 1",
       x = "Date",
       y = "Proportion of Deaths") +
  theme_minimal()

# Display the plots
print(yearly_plot)
print(monthly_plot)

Step 4: Add a handwashing_started column We will add a handwashing_started column to the monthly data frame, indicating whether handwashing had started (TRUE) or not (FALSE) based on the date June 1st, 1847.

# Add handwashing_started column
monthly <- monthly %>%
  mutate(handwashing_started = date >= as.Date("1847-06-01"))

# Plot the monthly data with handwashing_started
handwashing_plot <- ggplot(monthly, aes(x = as.Date(date), y = proportion_deaths, color = handwashing_started)) +
  geom_line() +
  labs(title = "Monthly Proportion of Deaths in Clinic 1 Before and After Handwashing",
       x = "Date",
       y = "Proportion of Deaths",
       color = "Handwashing Started") +
  theme_minimal()

# Display the plot
print(handwashing_plot)

Step 5: Calculate the mean proportion of deaths before and after handwashing Finally, we will calculate the mean proportion of deaths before and after handwashing and store the results in a 2x2 data frame named monthly_summary.

# Calculate mean proportion of deaths before and after handwashing
monthly_summary <- monthly %>%
  group_by(handwashing_started) %>%
  summarise(mean_proportion_deaths = mean(proportion_deaths))

# Display the summary
print(monthly_summary)

Summary The monthly_summary data frame will show the mean proportion of deaths before and after the implementation of handwashing. This will give us a clear idea of how much handwashing reduced the monthly death rates on average.

# Output of monthly_summary
monthly_summary