Skip to content

Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital. Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on June 1st, 1847, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.

I reanalyzed the data that made Semmelweis discover the importance of handwashing and its impact on the hospital.

The data is stored as two CSV files within the data folder.

yearly_deaths_by_clinic.csv contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.

ColumnDescription
yearYears (1841-1846)
birthsNumber of births
deathsNumber of deaths
clinicClinic 1 or clinic 2

monthly_deaths.csv contains data from 'Clinic 1' of the hospital where most deaths occurred.

ColumnDescription
dateDate (YYYY-MM-DD)
birthsNumber of births
deathsNumber of deaths
# Imported libraries
library(tidyverse)

# Loading Dataset
monthly <- read_csv('data/monthly_deaths.csv')
yearly <- read_csv('data/yearly_deaths_by_clinic.csv')

The console message specifies the following about the two dataframes

inspecting the dataset - basic summary stats, nulls and visualising the columns

head(yearly)
head(monthly)
library(skimr)

skim_without_charts(yearly)
skim_without_charts(md_clinic1)

the data is complete with no missing values and ready for analysis

ggplot(yearly, aes(x= year)) + 
	geom_col(aes(y = births), fill = "lightgreen", alpha = 0.7) + 
	geom_col(aes(y = deaths), fill = "tomato", alpha = 0.9) 

transforming data to long format for making clustered bar chart. this operation merges birth and death counts into a single count column, and creates an event column to identify what the count pertains to i.e. birth or death. the clinic column is untouched. thereafter, we can plot it as a clustered bar chart

y_long <- pivot_longer(yearly, cols = c(births,deaths), names_to = "event" , values_to = "count")
y_long
ggplot(y_long, aes(x = year, y = count, fill = event)) +
	geom_bar(stat = "identity", position = "dodge") + 
	labs(title = "Births ans Deaths Over Years", x="Year", y="Count", fill="Event") +
	facet_wrap(~clinic)

above viz is on absolute values which is not very useful. Up next, the analysis focuses on proportion of deaths i.e. calculating the proportion of deaths per number of births for each year

adding proportion_deaths columns to each dataset