Reducing hospital readmissions
📖 Background
You work for a consulting company helping a hospital group better understand patient readmissions. The hospital gave you access to ten years of information on patients readmitted to the hospital after being discharged. The doctors want you to assess if initial diagnoses, number of procedures, or other variables could help them better understand the probability of readmission.
They want to focus follow-up calls and attention on those patients with a higher probability of readmission.
💾 The data
You have access to ten years of patient information (source):
Information in the file
- "age" - age bracket of the patient
- "time_in_hospital" - days (from 1 to 14)
- "n_procedures" - number of procedures performed during the hospital stay
- "n_lab_procedures" - number of laboratory procedures performed during the hospital stay
- "n_medications" - number of medications administered during the hospital stay
- "n_outpatient" - number of outpatient visits in the year before a hospital stay
- "n_inpatient" - number of inpatient visits in the year before the hospital stay
- "n_emergency" - number of visits to the emergency room in the year before the hospital stay
- "medical_specialty" - the specialty of the admitting physician
- "diag_1" - primary diagnosis (Circulatory, Respiratory, Digestive, etc.)
- "diag_2" - secondary diagnosis
- "diag_3" - additional secondary diagnosis
- "glucose_test" - whether the glucose serum came out as high (> 200), normal, or not performed
- "A1Ctest" - whether the A1C level of the patient came out as high (> 7%), normal, or not performed
- "change" - whether there was a change in the diabetes medication ('yes' or 'no')
- "diabetes_med" - whether a diabetes medication was prescribed ('yes' or 'no')
- "readmitted" - if the patient was readmitted at the hospital ('yes' or 'no')
Acknowledgments: Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore, "Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records," BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.
import pandas as pd
import numpy as np
import pingouin
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('data/hospital_readmissions.csv')
df.head()
💪 Competition challenge
Create a report that covers the following:
- What is the most common primary diagnosis by age group?
- Some doctors believe diabetes might play a central role in readmission. Explore the effect of a diabetes diagnosis on readmission rates.
- On what groups of patients should the hospital focus their follow-up efforts to better monitor patients with a high probability of readmission?
df.describe()
df = df.replace('Missing', np.nan)
print(df.isna().sum())
# Delete missing values from 'diag_1' column
df.dropna(axis=0, subset=['diag_1'], inplace=True)
# Convert 'age' column to category type
cats = pd.CategoricalDtype(['[40-50)', '[50-60)', '[60-70)', '[70-80)', '[80-90)', '[90-100)'], ordered=True)
df['age'] = df['age'].astype(cats)
age_40_50 = df[df['age'] == '[40-50)']
age_40_50_diag = age_40_50['diag_1'].value_counts(normalize=True)
age_50_60 = df[df['age'] == '[50-60)']
age_50_60_diag = age_50_60['diag_1'].value_counts(normalize=True)
age_60_70 = df[df['age'] == '[60-70)']
age_60_70_diag = age_60_70['diag_1'].value_counts(normalize=True)
age_70_80 = df[df['age'] == '[70-80)']
age_70_80_diag = age_70_80['diag_1'].value_counts(normalize=True)
age_80_90 = df[df['age'] == '[80-90)']
age_80_90_diag = age_80_90['diag_1'].value_counts(normalize=True)
age_90_100 = df[df['age'] == '[90-100)']
age_90_100_diag = age_90_100['diag_1'].value_counts(normalize=True)
colors = {'Circulatory':'#e32551', 'Respiratory':'#ffc219', 'Diabetes':'#f07c19', 'Digestive':'#88c100',
'Injury':'#e5d599', 'Musculoskeletal':'#029daf', 'Other':'#949a8e'}
explode = (0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09)
# Plot the primary diagnosis per age group
fig, ax = plt.subplots(2,3, figsize=(10,10))
for i, axi in enumerate(ax.flat):
# Get the age group and diagnosis counts
if i == 0:
age_group = '[40-50)'
diag_counts = age_40_50_diag
elif i == 1:
age_group = '[50-60)'
diag_counts = age_50_60_diag
elif i == 2:
age_group = '[60-70)'
diag_counts = age_60_70_diag
elif i == 3:
age_group = '[70-80)'
diag_counts = age_70_80_diag
elif i == 4:
age_group = '[80-90)'
diag_counts = age_80_90_diag
else:
age_group = '[90-100)'
diag_counts = age_90_100_diag
# Get the colors for each diagnosis
slice_colors = [colors.get(diag, 'gray') for diag in diag_counts.index]
# Plot the pie charts
axi.pie(diag_counts, colors=slice_colors, autopct='%1.1f%%', shadow=True, startangle=140, explode=explode, textprops={'fontsize': 7.5})
axi.set_title('Age {}'.format(age_group))
handles = [plt.Rectangle((0,0),1,1, color=colors[label]) for label in colors]
labels = list(colors.keys())
fig.legend(handles, labels, loc='center left')
fig.suptitle('Primary Diagnosis per Age Group', y=0.90)
plt.show()
According to the charts, the most common primary diagnosis in all age groups is Circulatory. Some diagnoses, such as Diabetes and Musculoskeletal decrease with age and others, such as Digestive or Respiratory maintain a similar proportion in all age groups.
replace = {'no':False, 'yes':True}
df['readmitted'].replace(replace, inplace=True)
df.groupby(['age', 'diag_1'])['readmitted'].mean()
One question that can be asked is whether the primary diagnosis is related to readmission rates.
props_diag = df.groupby('diag_1')['readmitted'].value_counts(normalize=True)
wide_props_diag = props_diag.unstack()
wide_props_diag.plot(kind='bar', stacked=True, color=['#f07c19', '#029daf'])
plt.show()
So, it appears that Diabetes diagnosis has the highest proportion of readmissions, while Musculoskeletal diagnosis has the lowest. But, to check whether or not this difference in proportions across all primary diagnoses is significant, a chi-square test of independence can be performed.
- Null hypothesis (H0) = Primary diagnosis and readmission rates are independent.
- Alternative hypothesis (H1) = Primary diagnosis and readmission rates are associated.
- Significance level (alpha) = 0.05
expected, observed, stat = pingouin.chi2_independence(data=df, x='diag_1', y='readmitted')
print(stat[stat['test'] == 'pearson'])
There is significant evidence (p value = 0.0000000000000003470458) to believe that readmission rates and primary diagnosis are associated, therefore the null hypothesis is rejected.
‌
‌