Reducing hospital readmissions
📖 Background
You work for a consulting company helping a hospital group better understand patient readmissions. The hospital gave you access to ten years of information on patients readmitted to the hospital after being discharged. The doctors want you to assess if initial diagnoses, number of procedures, or other variables could help them better understand the probability of readmission.
They want to focus follow-up calls and attention on those patients with a higher probability of readmission.
💾 The data
You have access to ten years of patient information (source):
Information in the file
- "age" - age bracket of the patient
- "time_in_hospital" - days (from 1 to 14)
- "n_procedures" - number of procedures performed during the hospital stay
- "n_lab_procedures" - number of laboratory procedures performed during the hospital stay
- "n_medications" - number of medications administered during the hospital stay
- "n_outpatient" - number of outpatient visits in the year before a hospital stay
- "n_inpatient" - number of inpatient visits in the year before the hospital stay
- "n_emergency" - number of visits to the emergency room in the year before the hospital stay
- "medical_specialty" - the specialty of the admitting physician
- "diag_1" - primary diagnosis (Circulatory, Respiratory, Digestive, etc.)
- "diag_2" - secondary diagnosis
- "diag_3" - additional secondary diagnosis
- "glucose_test" - whether the glucose serum came out as high (> 200), normal, or not performed
- "A1Ctest" - whether the A1C level of the patient came out as high (> 7%), normal, or not performed
- "change" - whether there was a change in the diabetes medication ('yes' or 'no')
- "diabetes_med" - whether a diabetes medication was prescribed ('yes' or 'no')
- "readmitted" - if the patient was readmitted at the hospital ('yes' or 'no')
Acknowledgments: Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore, "Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records," BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.
TABLE OF CONTENTS
- Introduction
- Executive Summary
- Exploratory Data Analysis
- Predictive Analytics
- Discussion
- Conclusion (include 4 points pls)
INTRODUCTION
Hospital readmissions are an indicator of quality of care.
EXECUTIVE SUMMARY
In general
EXPLORATORY DATA ANALYSIS
In this section, we'll dive into data cleansing, transformations, visualization and general exploration of our data in order to uncover first-hand insights.
# Import Necessary libraries
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='darkgrid')
# Load the dataset
health_data = pd.read_csv('data/hospital_readmissions.csv')
# View the first few lines
health_data.head()
# Create a different dataframe for Exploration
df = health_data.copy()
# Check the number of observations
print(df.shape)
# Age distribution
df.age.value_counts()
# Check for multi-colinearity between numeric variables
df.corr()
# Correlation plot between the numeric variables
ax = sns.pairplot(data=df, hue='readmitted')
plt.show()
For this case study, We have adult patients from the ages of 40 to 100 with the age bracket 70-80 being the most common and ages 90-100 the least common
# `diag_1` represents the primary diagnosis
print('The unique primary diagnosis include:\n', ', '.join(df['diag_1'].unique()))
print('Checking for the Most common primary diagnosis by age group......')
prim_diag_age = df.groupby('age')['diag_1'].agg(stats.mode).reset_index()
def split_vals(col):
"""Split the values for the column"""
most_freqs = []
counts = []
for val in col:
most_freq, count = val
most_freqs.append(most_freq[0])
counts.append(count[0])
return most_freqs, counts
prim_diag_age['most_common_diagnosis'], prim_diag_age['counts'] = split_vals(prim_diag_age['diag_1'])
del prim_diag_age['diag_1']
prim_diag_age
‌
‌