Competition - hospital readmissions

Reducing hospital readmissions

📖 Background

You work for a consulting company helping a hospital group better understand patient readmissions. The hospital gave you access to ten years of information on patients readmitted to the hospital after being discharged. The doctors want you to assess if initial diagnoses, number of procedures, or other variables could help them better understand the probability of readmission.

They want to focus follow-up calls and attention on those patients with a higher probability of readmission.

Install and load some of the packages

install.packages("tidyverse")
library(tidyverse)
library(dplyr)
install.packages("broom")
library(broom)
library(tidyr)

💾 The data and the columns.

You have access to ten years of patient information (source):

Information in the file

"age" - age bracket of the patient
"time_in_hospital" - days (from 1 to 14)
"n_procedures" - number of procedures performed during the hospital stay
"n_lab_procedures" - number of laboratory procedures performed during the hospital stay
"n_medications" - number of medications administered during the hospital stay
"n_outpatient" - number of outpatient visits in the year before a hospital stay
"n_inpatient" - number of inpatient visits in the year before the hospital stay
"n_emergency" - number of visits to the emergency room in the year before the hospital stay
"medical_specialty" - the specialty of the admitting physician
"diag_1" - primary diagnosis (Circulatory, Respiratory, Digestive, etc.)
"diag_2" - secondary diagnosis
"diag_3" - additional secondary diagnosis
"glucose_test" - whether the glucose serum came out as high (> 200), normal, or not performed
"A1Ctest" - whether the A1C level of the patient came out as high (> 7%), normal, or not performed
"change" - whether there was a change in the diabetes medication ('yes' or 'no')
"diabetes_med" - whether a diabetes medication was prescribed ('yes' or 'no')
"readmitted" - if the patient was readmitted at the hospital ('yes' or 'no')

Acknowledgments: Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore, "Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records," BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.

suppressPackageStartupMessages(library(tidyverse))
readmissions <- readr::read_csv('data/hospital_readmissions.csv', show_col_types = FALSE)
readmissions

What Questions are we trying to answer?

What is the most common primary diagnosis by age group?
Some doctors believe diabetes might play a central role in readmission. Explore the effect of a diabetes diagnosis on readmission rates.
On what groups of patients should the hospital focus their follow-up efforts to better monitor patients with a high probability of readmission?

Let us get to it.

Inspect the data to see that the class of each column

str(readmissions)
summary(readmissions)

Converting Categorical columns to Factors

For now everything looks good but I want to convert all categorical columns to factors. We do that and inspect the data again.

readmissions$age <- as.factor(readmissions$age)
readmissions$medical_specialty <- as.factor(readmissions$medical_specialty)
readmissions$diag_1 <- as.factor(readmissions$diag_1)
readmissions$diag_2 <- as.factor(readmissions$diag_2)
readmissions$diag_3 <- as.factor(readmissions$diag_3)
readmissions$glucose_test <- as.factor(readmissions$glucose_test)
readmissions$A1Ctest <- as.factor(readmissions$A1Ctest)
readmissions$change <- as.factor(readmissions$change)
readmissions$diabetes_med <- as.factor(readmissions$diabetes_med)
readmissions$readmitted <- as.factor(readmissions$readmitted)

summary(readmissions)

1. What is the most common primary diagnosis by age group?

Using the column "diag_1", I want to compare the frequency of each diagnosis across the different age groups. By arranging it in order from highest to lowest and selecting the top 3, I will be able to see the most common diagnosis across age groups.

#arranging the dataset and adding a percentage column
diagnosis_by_age_group <- readmissions %>%
  group_by(age, diag_1) %>%
  summarise(
    count_of_diag = n()
  ) %>%
  mutate(
    Percentage=paste0(round(count_of_diag/sum(count_of_diag)*100,2),"%")
  )
diagnosis_by_age_group

Here, we see the count of the first diagnosis by age group. We also see the proportion calculated for each age group.

#then we pick the top 3 to find the 3 most common diagnosis for each age group
diagnosis_by_age_groups <- diagnosis_by_age_group %>%                                     
  arrange(desc(count_of_diag)) %>% 
  group_by(age) %>%
  slice(1:3)
diagnosis_by_age_groups

I added this line of code to help select the top 3 Primary diagnosis by age group. The most common diagnosis across all age groups are Circulatory and Respiratory diagnosis.

‌
‌
‌