In 1986, a group of urologists in London published a research paper in The British Medical Journal that compared the effectiveness of two different methods to remove kidney stones. Treatment A was open surgery (invasive), and treatment B was percutaneous nephrolithotomy (less invasive). When they looked at the results from 700 patients, treatment B had a higher success rate. However, when they only looked at the subgroup of patients different kidney stone sizes, treatment A had a better success rate. What is going on here? This known statistical phenomenon is called Simpon’s paradox. Simpon's paradox occurs when trends appear in subgroups but disappear or reverse when subgroups are combined.
The Data
Available on kidney_stone_data.csv
Column | Type | Description |
---|---|---|
treatment | discrete | Treatment method, indicated by A or B |
stone_size | discrete | Size of the kidney stone, categorized as 'small' or 'large' |
success | discrete | Outcome of the treatment: 1=successful, 0=unsuccessful |
In this project, you are going to explore Simpon’s paradox using multiple regression and other statistical tools. Our main goal is to determine if Treatment A is superior to Treatment B after accounting for the severity of the kidney stones. Let's dive in now!
# Load the necessary packages
library(readr)
library(dplyr)
library(ggplot2)
library(broom)
# Load the data
data <- read_csv("kidney_stone_data.csv")
# Inspect the first five rows
head(data, 5)
# Start coding here...add as many cells as you like!
#organizing the data
data %>%
group_by(treatment, stone_size, success) %>%
summarise(N = n()) %>%
ungroup() %>%
group_by(treatment, stone_size) %>%
mutate(frequency = round(N/sum(N),3))
data %>%
group_by(treatment, stone_size) %>%
summarise(mean = mean(success))
#test for the counfunding variable using chi squre test becouse we have categoraicqal variables
chi_test <- chisq.test(data$stone_size,data$treatment)
#as we can see theres a relathioship between the stone aize and treatment
tidy(chi_test)
#calculating the logostic mode while acouninf for the counfunding variable
logistic_model <- glm(success ~ stone_size + treatment ,data = data , family = "binomial")
tidy(logistic_model)
small_high_success <- c("Yes")
A_B_sig <- c("No")