Skip to content

Assessing the Effectiveness of Medical Treatments

In 1986, a group of urologists in London published a research paper in The British Medical Journal that compared the effectiveness of two different methods to remove kidney stones. Treatment A was open surgery (invasive), and treatment B was percutaneous nephrolithotomy (less invasive). When they looked at the results from 700 patients, treatment B had a higher success rate. However, when they only looked at the subgroup of patients different kidney stone sizes, treatment A had a better success rate. What is going on here? This known statistical phenomenon is called Simpon’s paradox. Simpon's paradox occurs when trends appear in subgroups but disappear or reverse when subgroups are combined.

The Data

Available on kidney_stone_data.csv

ColumnTypeDescription
treatmentdiscreteTreatment method, indicated by A or B
stone_sizediscreteSize of the kidney stone, categorized as 'small' or 'large'
successdiscreteOutcome of the treatment: 1=successful, 0=unsuccessful

In this project, you are going to explore Simpon’s paradox using multiple regression and other statistical tools. Our main goal is to determine if Treatment A is superior to Treatment B after accounting for the severity of the kidney stones. Let's dive in now!

1. Load the Necessary Libraries and Data

# Load the necessary packages
library(readr)
library(dplyr)
library(ggplot2)
library(broom)

# Load the data
data <- read_csv("kidney_stone_data.csv")

# Inspect the first five rows
head(data, 5)

# Start coding here...add as many cells as you like!

2. Data Exploration

Check Data Structure

# Check the structure of the data
str(data)

Descriptive Statistics

# Load the necessary library
if (!requireNamespace("psych", quietly = TRUE)) {
  install.packages("psych")
}
library(psych)

# Use the describe function from the psych package
describe(data)

The descriptive statistics for the dataset provide an overall summary of three variables: treatment, stone_size, and success across 700 observations. For the treatment variable, the mean is 1.5 with a standard deviation of 0.5004, indicating an equal distribution between the two treatment groups (A and B). The median and trimmed mean are both 1.5, and the minimal skewness (0) and high kurtosis (-2.0029) suggest a uniform distribution. The stone_size variable has a mean of 1.51 and a standard deviation of 0.5003, reflecting a slight preference for larger stones (median = 2) with minimal skewness (-0.0399) and kurtosis (-2.0013). Lastly, the success variable shows a mean success rate of 0.8029 with a standard deviation of 0.3981. The median success rate is 1, signifying a tendency towards success, although the negative skewness (-1.5192) indicates a greater number of unsuccessful cases. The high positive kurtosis (0.3085) suggests a leptokurtic distribution, and the standard error for 'success' is relatively low at 0.015, indicating precise estimates of the sample mean.

3. Data Visualization

Overall Success Rates by Treatment

# Visualize overall success rates by treatment
ggplot(data, aes(x = treatment, y = success)) +
  stat_summary(fun = mean, geom = "bar", fill = "steelblue", width = 0.5) +
  ggtitle("Overall Success Rates by Treatment")

The bar chart titled "Overall Success Rates by Treatment" shows that Treatment B has a slightly higher success rate (approximately 82.6%) compared to Treatment A (just below 80%) for kidney stone removal.

Success Rates by Treatment and Stone Size