In 1986, a group of urologists in London published a research paper in The British Medical Journal that compared the effectiveness of two different methods to remove kidney stones. Treatment A was open surgery (invasive), and treatment B was percutaneous nephrolithotomy (less invasive). When they looked at the results from 700 patients, treatment B had a higher success rate. However, when they only looked at the subgroup of patients different kidney stone sizes, treatment A had a better success rate. What is going on here? This known statistical phenomenon is called Simpon’s paradox. Simpon's paradox occurs when trends appear in subgroups but disappear or reverse when subgroups are combined.
The Data
Available on kidney_stone_data.csv
| Column | Type | Description |
|---|---|---|
treatment | discrete | Treatment method, indicated by A or B |
stone_size | discrete | Size of the kidney stone, categorized as 'small' or 'large' |
success | discrete | Outcome of the treatment: 1=successful, 0=unsuccessful |
In this project, you are going to explore Simpon’s paradox using multiple regression and other statistical tools. Our main goal is to determine if Treatment A is superior to Treatment B after accounting for the severity of the kidney stones. Let's dive in now!
# Load the necessary packages
library(readr)
library(dplyr)
library(ggplot2)
library(broom)
# Load the data
data <- read_csv("kidney_stone_data.csv")
# Inspect the first five rows
head(data, 5)
# Start coding here...add as many cells as you like!