Hungarian physician Dr. Ignaz Semmelweis worked at the Vienna General Hospital with childbed fever patients. Childbed fever is a deadly disease affecting women who have just given birth, and in the early 1840s, as many as 10% of the women giving birth died from it at the Vienna General Hospital. Dr.Semmelweis discovered that it was the contaminated hands of the doctors delivering the babies, and on June 1st, 1847, he decreed that everyone should wash their hands, an unorthodox and controversial request; nobody in Vienna knew about bacteria.
You will reanalyze the data that made Semmelweis discover the importance of handwashing and its impact on the hospital and the number of deaths.
The data is stored as two CSV files within the data folder.
data/yearly_deaths_by_clinic.csv contains the number of women giving birth at the two clinics at the Vienna General Hospital between the years 1841 and 1846.
| Column | Description |
|---|---|
year | Years (1841-1846) |
births | Number of births |
deaths | Number of deaths |
clinic | Clinic 1 or clinic 2 |
data/monthly_deaths.csv contains data from 'Clinic 1' of the hospital where most deaths occurred.
| Column | Description |
|---|---|
date | Date (YYYY-MM-DD) |
births | Number of births |
deaths | Number of deaths |
# Imported libraries
import pandas as pd
import matplotlib.pyplot as plt# Start coding here
# Use as many cells as you like!1 - Identify the year with the highest death proportion for each clinic
# Loading the data
yearly_data = pd.read_csv('data/yearly_deaths_by_clinic.csv')
yearly_data.head()# Calculate the yearly proportion of deaths
yearly_data['prop_death'] = yearly_data['deaths'] / yearly_data['births']
yearly_data.head()# Visualize the yearly proportion of deaths
for clinic in yearly_data['clinic'].unique():
clinic_data = yearly_data[yearly_data['clinic'] == clinic]
plt.plot(clinic_data['year'], clinic_data['prop_death'], label=clinic)
plt.xlabel('Year')
plt.ylabel('Proportion of Deaths')
plt.title('Year vs Proportion of Deaths by Clinic')
plt.legend(title='Clinic') # adds title to legend
plt.grid(True) # adds a grid to the plot when True
plt.show()highest_year = 18422 - Determine the mean monthly death proportion before and after handwashing
# Loading the data
monthly_data = pd.read_csv('data/monthly_deaths.csv')
monthly_data.head()# Calculate the monthly proportion of deaths
monthly_data['prop_death'] = monthly_data['deaths'] / monthly_data['births']
monthly_data.head()# Adding a column/creating a boolean column
threshold = '1847-06-01' # date is object dtype, so date string
monthly_data['handwashing_started'] = monthly_data['date'] >= threshold
monthly_data.head()# Calculate the mean before and after handwashing
monthly_summary = monthly_data.groupby('handwashing_started').agg({'prop_death': 'mean'}).reset_index()
monthly_summary3 - Calculate a 95% confidence interval