Skip to content
Exploring hospital readmission rates using age, the specific role of diabetes and allround diagnosis [Competition]
  • AI Chat
  • Code
  • Report
  • Spinner

    Exploring hospital readmission rates using age, the specific role of diabetes and allround diagnosis

    Background

    We've been hired as a consulting company helping a hospital group better understand patient readmissions. The hospital gave us access to ten years of information on patients readmitted to the hospital after being discharged. The doctors want us to assess if initial diagnoses, diabetes diagnosis or other variables could help them better understand the probability of readmission.

    They want to focus follow-up calls and attention on those patients with a higher probability of readmission.

    Research questions

    This notebook is focused on providing insights on the following topics :

    1. What is the most common diagnosis per age group?
    2. Some doctors believe diabetes might play a central role in readmission. Explore the effect of a diabetes diagnosis on readmission rates.
    3. On what groups of patients should the hospital focus their follow-up efforts to better monitor patients with a high probability of readmission?

    Analysis Methods

    The provided data was analyzed using Jupyter Notebook and Python. All code was hidden to improve the readabilty of the report.

    We first performed data validation, in which we explored :

    • general information (25000 rows of information for 17 variables),
    • NULL values (none were found),
    • duplicates (none were found),
    • data type statistical summaries (information gathered was dependent on data type),
    • outliers (outliers were found for several variables, however dropping them would mean data loss of 34%, so dropping outliers was decided against).

    Next we proceeded with our actual analysis, by using following steps :

    • Most common diagnosis per age group was assessed by selecting all age groups, counting the amounts for each diagnosis within the age group and selecting the diagnosis with the highest counts. This was done for primary, secundary and tertiary diagnosis.
    • The effect of diabetes diagnosis on readmissions was assessed by first checking general readmissions, next examining readmissions in the diabetes patients and next assessing the readmission rates for primary, secundary and tertiary diagnosis of diabetes (more specifically, either of the 3 stages).
    • Assessing which groups of patients should be focused based on their high rate of readmission was done based on the total picture of the patient's health and age. We took into account age and primary + secundary + tertiary diagnosis and determined which unique combinations had the higest readmissions rates.

    Executive summary

    The most common diagnosis per age group

    We found that both Circulatory Disease and Other are among the most prevelant diagnoses. More specifically, this was analyzed for primary, secundary and tertiary diagnosis, rendering the following results :

    The most common primary diagnoses are (sorted in descending order)

    • [70-80) : Circulatory Disease
    • [60-70) : Circulatory Disease
    • [80-90) : Circulatory Disease
    • [50-60) : Circulatory Disease
    • [40-50) : Other
    • [90-100) : Circulatory Disease

    The most common secundary diagnoses are (sorted in descending order)

    • [70-80) : Circulatory Disease
    • [60-70) : Other
    • [80-90) : Other
    • [50-60) : Other
    • [40-50) : Other
    • [90-100) : Other

    The most common secundary diagnoses are (sorted in descending order)

    • [70-80) : Other
    • [60-70) : Other
    • [80-90) : Other
    • [50-60) : Other
    • [40-50) : Other
    • [90-100) : Other

    Effect of a diabetes diagnosis on readmission rates

    Analyzing general readmission rates for all patients shows that the majority of patients is not readmitted (53%). Examination of the all diabetes patients (so including patient with diabetes as primary / secundary / tertiary diagnosis) showed this same trend, with the majority of them not being readmitted (53.5%).

    However, analyzing the effect of the different stages of diabetes diagnosis for all patients, gives us a more nuanced view :

    For patients with diabetes as a primary diagnosis

    • 53.6% were readmitted,
    • 46.4% were not readmitted.

    For patients with diabetes a secondary diagnosis

    • 44.2% were readmitted,
    • 55.8% were not readmitted.

    For patients with diabetes as a tertiary diagnosis

    • 54.3% were readmitted,
    • 45.7% were not readmitted.

    This would suggest that having diabetes as a primary or tertiary diagnosis could increase the chance of readmission, and patients with diabetes as a secondary diagnosis are readmitted significantly less.

    Conclusion : Taking into account both the general readmission rates and those specifically for diabetes, it could be concluded that while diabetes could contribute to readmission rates for some patients, it does not play a central role in the overall readmission rates.

    On what groups of patients should the hospital focus their follow-up efforts to better monitor patients with a high probability of readmission?

    Taking into account the patient's age and allround diagnoses (meaning primary + secundary + tertiary diagnosis) allowed us to determine which unique combinations had the highest readmissions rates :

    • The patient group most prone to readmission is age group [90-100) with an allround diagnosis of circulatory disease, with readmissions rate of 12.0%.
    • This is followed by age group [70-80) with an allround diagnosis of circulatory disease, with readmissions rate of 8.3%.
    • Third in line is age group [40-50), with an allround diagnosis of 'Other' and readmissions rate of of 8.0%.
    • The top 10 selections of readmissions are all combinations of 'Circulatory Disease', or 'Other'.
    • Diabetes is only present in 1 of the top 10 selected groups and ranks 6th place. More specifically, it involves patients with primary diagosis of diabetes and secondary/tertiary 'Other', have a readmissions rate of about 6.4%.

    Conclusion and recommendation : The patient group which should primarily be focused on is age group [90-100) with an allround diagnosis of Circulatory Disease. Extended follow-up and specialized nursing programs and/or home-care could be implemented to help prevent readmissions. These measures could also be used on other patients groups at risk of readmissions, however how many of these groups should be focused on is at the discretion of the Physician's Board and Hospital Direction and it should be further assessed whether these programs are effective.

    Data Description

    We have access to ten years of patient information (source):

    Information

    • "age" - age bracket of the patient
    • "time_in_hospital" - days (from 1 to 14)
    • "n_procedures" - number of procedures performed during the hospital stay
    • "n_lab_procedures" - number of laboratory procedures performed during the hospital stay
    • "n_medications" - number of medications administered during the hospital stay
    • "n_outpatient" - number of outpatient visits in the year before a hospital stay
    • "n_inpatient" - number of inpatient visits in the year before the hospital stay
    • "n_emergency" - number of visits to the emergency room in the year before the hospital stay
    • "medical_specialty" - the specialty of the admitting physician
    • "diag_1" - primary diagnosis (Circulatory, Respiratory, Digestive, etc.)
    • "diag_2" - secondary diagnosis
    • "diag_3" - additional secondary diagnosis
    • "glucose_test" - whether the glucose serum came out as high (> 200), normal, or not performed
    • "A1Ctest" - whether the A1C level of the patient came out as high (> 7%), normal, or not performed
    • "change" - whether there was a change in the diabetes medication ('yes' or 'no')
    • "diabetes_med" - whether a diabetes medication was prescribed ('yes' or 'no')
    • "readmitted" - if the patient was readmitted at the hospital ('yes' or 'no')

    Acknowledgments

    Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore, "Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records," BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.

    Data validation

    Before starting the analysis, data will be validated, and if needed the necessary data cleaning will be done to result in analysis-ready data.

    Validating the accuracy, clarity, and details of data is necessary to mitigate any project defects. Without validating data, we have the risk of basing our decisions or recommendations on data with imperfections that are not accurately representative of the situation at hand.

    Data and libraries import

    Hidden code

    General information

    The general information of the dataset allows us to assess whether there are any NULL values and which datatypes are used.

    Hidden code

    Duplicates

    Duplicated data can lead to distortions in our analysis, therefor we'll assess whether there are any present in our current set.

    Hidden code

    Data type summaries

    Summarizing the statistics for all datatypes aides us in verifying the validity of the data, and also to double check some assumptions later during our analysis.

    Hidden code
    Hidden code

    Outliers

    To further assess whether our data is analysis-ready, we have to look at outliers values. Outliers can distort our clear view on data and consequently all further calculations in our analysis. Because this could have a significant impact on the quality of our recommendations, we have to make an informed decision on whether to keep outliers. To do this, we will examine and visualise relevant numerical values, as shown below.

    Hidden code

    This boxplot shows that there are outlier values for all numerical variables. All outliers are situated on the high range (> Q3). To assess whether data is significantly distorted, and if outliers will also distort calculations, we'll try dropping the outliers and make a new visualisation.

    #define key values of the boxplot (= q1 - IQR - q3)
    q1 = numerical.quantile(0.25)
    q3 = numerical.quantile(0.75)
    IQR = q3 - q1
    
    #define bounds of core data
    lower_bound = q1 - 1.5 * IQR
    upper_bound = q3 + 1.5 * IQR
    
    #select core data = dropping outliers (so all values below Q1 and above Q3)
    data_wo_outliers = numerical[~((numerical < lower_bound) | (numerical > upper_bound))]