Which version of the website should you use?
📖 Background
You work for an early-stage startup in Germany. Your team has been working on a redesign of the landing page. The team believes a new design will increase the number of people who click through and join your site.
They have been testing the changes for a few weeks, and now they want to measure the impact of the change and need you to determine if the increase can be due to random chance or if it is statistically significant.
💾 The data
The team assembled the following file:
Redesign test data
- "treatment" - "yes" if the user saw the new version of the landing page, no otherwise.
- "new_images" - "yes" if the page used a new set of images, no otherwise.
- "converted" - 1 if the user joined the site, 0 otherwise.
The control group is those users with "no" in both columns: the old version with the old set of images.
suppressPackageStartupMessages(library(tidyverse))
df <- readr::read_csv('data/redesign.csv', show_col_types = FALSE)
head(df)Task
- Analyze the conversion rates for each of the four groups: the new/old design of the landing page and the new/old pictures.
- Can the increases observed be explained by randomness? (Hint: Think A/B test)
- Which version of the website should they use?
The highest probability of conversion is with the new design and old pictures.
But are those results statistically significant?
Estimating a probability of conversion using a linear regression model shows that the website has a baseline conversion rate of around 10.71%
The baseline conversion rate (intercept) and the new design (treatment) are the only predictors that have a statistically significant effect. The effects of the new design and new images are close to significant, but they actually indicate an adverse effect on conversion. Therefore, it is recommended to use the new website design with old images.
suppressPackageStartupMessages(library(tidymodels))
# probability of conversion by predictor combination
df2 <- df %>%
group_by(treatment, new_images) %>%
summarize(prob = mean(converted)) %>%
cbind(data.frame(group = c('A (CONTROL)',LETTERS[2:4]))) %>% suppressMessages()
df2
#plot column chart
ggplot(data = df2, aes(x = group, y = prob)) +
geom_col(aes(fill = group)) +
geom_text(aes(label = paste0("new design: ", treatment)), vjust = -1.5) +
geom_text(aes(label = paste0("new images: ", new_images)), vjust = -.3) +
geom_hline(aes(yintercept = min(prob)), lty = 2) +
theme_classic()
# define linear regression model
# converted = treatment + new_images + interaction
model <- lm(data = df, converted ~ treatment*new_images)
summary(model) %>% tidy()