Skip to content
New Workbook
Sign up
hospital readmissions of patients
0

Introduction

Hospital readmissions have become a major focus of healthcare quality improvement efforts in recent years. Readmission rates measure the percentage of patients discharged from a hospital who are admitted again within a certain period of time. High readmission rates are associated with poorer quality of care and can lead to increased costs for health systems, so it is important for hospitals to reduce their readmission rates. Hospitals can use several strategies to reduce readmissions, such as better coordination of care between providers, improved discharge planning, and more effective patient education. By reducing readmission rates, hospitals can improve the quality of care they provide and ultimately improve patient outcomes.

Objective

A hospital group to investigate the factors that might influence the probability of patient readmission. The hospital has given access to ten years of data related to readmissions and has asked the consulting company to analyze the data to determine if initial diagnoses, number of procedures, or other variables could provide insight.

💾 The data

You have access to ten years of patient information (source):

Information in the file
  • "age" - age bracket of the patient
  • "time_in_hospital" - days (from 1 to 14)
  • "n_procedures" - number of procedures performed during the hospital stay
  • "n_lab_procedures" - number of laboratory procedures performed during the hospital stay
  • "n_medications" - number of medications administered during the hospital stay
  • "n_outpatient" - number of outpatient visits in the year before a hospital stay
  • "n_inpatient" - number of inpatient visits in the year before the hospital stay
  • "n_emergency" - number of visits to the emergency room in the year before the hospital stay
  • "medical_specialty" - the specialty of the admitting physician
  • "diag_1" - primary diagnosis (Circulatory, Respiratory, Digestive, etc.)
  • "diag_2" - secondary diagnosis
  • "diag_3" - additional secondary diagnosis
  • "glucose_test" - whether the glucose serum came out as high (> 200), normal, or not performed
  • "A1Ctest" - whether the A1C level of the patient came out as high (> 7%), normal, or not performed
  • "change" - whether there was a change in the diabetes medication ('yes' or 'no')
  • "diabetes_med" - whether a diabetes medication was prescribed ('yes' or 'no')
  • "readmitted" - if the patient was readmitted at the hospital ('yes' or 'no')

Acknowledgments: Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore, "Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records," BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.

Executive summary

Our consulting company has been tasked with helping a hospital group improve their understanding of patient readmissions. We have been given access to ten years' worth of data on patients who were readmitted to the hospital after being discharged. Our goal is to assess whether initial diagnoses, the number of procedures, or other variables could provide insight into the probability of readmission, and to identify those patients who are at a higher risk of readmission so that the hospital can focus their follow-up calls and attention accordingly.

To achieve these objectives, we have prepared a report covering the following:

  1. Analysis of the most common primary diagnosis by age group.
  2. Exploration of the impact of a diabetes diagnosis on readmission rates.
  3. Identification of patient groups that the hospital should focus their follow-up efforts on to better monitor patients with a high probability of readmission.

Key findings:

  • Dataset summary:- Most frequent ages among the patients are between 70 and 80, the average time spent in the hospital is 4.5 days, the average number of lab procedures per patient is 43.2, most patients did not have an A1C test, most patients were prescribed diabetes medication, and most patients did not have a glucose test.

  • Task 1:- The primary diagnosis 'Circulatory' is the most common among all age groups, followed by 'Other'. This pattern is also observed in secondary and additional secondary diagnoses, indicating that blood flow-related issues are a prevalent concern among the population in the dataset, regardless of age.

  • Task 2:- The analysis of readmission rates in this dataset reveals that patients diagnosed with diabetes had a higher positive readmission rate compared to those diagnosed with other conditions.

    • The result of the chi-square test indicates that the value is below the specified threshold, which suggests a statistically significant relationship between the primary diagnosis of diabetes and hospital readmission rate. Therefore, we can reject the null hypothesis and accept the alternative hypothesis. Hence, we can conclude that the primary diagnosis of diabetes has an impact on hospital readmission rates.
  • Task 3:- Piatients in an age rages between 50 to 90 are over average readmission rate, therefore should be considered like high-risk of readmission. The major spike indicates that patients in the age group 70-80 are more subjected to readmission There is a rapid descent in patients over 80 years old due to their high age

    • The most important features for predicting diabetes are the number of inpatients, number of medications, and age. While glucose test and diabetes medication were also important, they had a lower than expected impact. Notably, the number of inpatients and medications was consistently identified as top features across different classifier models. These features are strongly associated with diabetes, glucose test, and circulatory-related health conditions as observed.

Consideration:

Based on the analysis, the hospital should focus their attention on patients diagnosed with diabetes as they have a higher probability of readmission. Additionally, patients in the age range of 50 to 90, especially those in the age group of 70-80, should be considered as high-risk of readmission.

Recommendation:

The hospital should implement a targeted follow-up system for patients diagnosed with diabetes to reduce the probability of readmission. Additionally, the hospital should develop a system that targets patients in the high-risk age group for follow-up calls and interventions. The system should prioritize patients with the highest probability of readmission based on the analysis of the dataset. The hospital could also consider using the most important features identified for predicting diabetes to develop a risk prediction model that could aid in identifying patients who are at a higher risk of readmission.

#Load dataset
data = pd.read_csv('data/hospital_readmissions.csv')
print("Shape:", data.shape)
data.head(5)

Data Validation This data set has 25000 rows, 17 columns. After validation, all variables were consistent with the data dictionary and no modifications were needed:

  • age:- An age grpuped into 6 brackets with no missing value, but had to be fixed for EDA use.
  • "time_in_hospital" - days (from 1 to 14)
  • "n_procedures" - number of procedures performed during the hospital stay
  • "n_lab_procedures" - number of laboratory procedures performed during the hospital stay
  • "n_medications" - number of medications administered during the hospital stay
  • "n_outpatient" - number of outpatient visits in the year before a hospital stay
  • "n_inpatient" - number of inpatient visits in the year before the hospital stay
  • "n_emergency" - number of visits to the emergency room in the year before the hospital stay
  • "medical_specialty" - the specialty of the admitting physician
  • "diag_1" - primary diagnosis (Circulatory, Respiratory, Digestive, etc.)
  • "diag_2" - secondary diagnosis
  • "diag_3" - additional secondary diagnosis
  • "glucose_test" - whether the glucose serum came out as high (> 200), normal, or not performed
  • "A1Ctest" - whether the A1C level of the patient came out as high (> 7%), normal, or not performed
  • "change" - whether there was a change in the diabetes medication ('yes' or 'no')
  • "diabetes_med" - whether a diabetes medication was prescribed ('yes' or 'no')
  • "readmitted" - if the patient was readmitted at the hospital ('yes' or 'no')

Original Dataset

# Check all variables in the data against the criteria 
data.info()
#Number of unique values'
variables = pd.DataFrame(columns=['Variable','Number of unique values','Values','Variable types'])

for i, var in enumerate(data.columns):
    variables.loc[i] = [var, data[var].nunique(),sorted(data[var].unique().tolist()), data[var].dtype]
variables.set_index('Variable', inplace=True)    
variables

Validate the categorical variables

cat = ['age', 'medical_specialty','diag_1','diag_2','diag_3','glucose_test', 'A1Ctest','change', 'diabetes_med', 'readmitted']
for column in cat:
  print(round(data[column].value_counts(normalize=True), 2))

Validate the numerical variables

data.describe()

Check the missing values in the columns