Which version of the website should you use?
๐ Background
You work for an early-stage startup in Germany. Your team has been working on a redesign of the landing page. The team believes a new design will increase the number of people who click through and join your site.
They have been testing the changes for a few weeks, and now they want to measure the impact of the change and need you to determine if the increase can be due to random chance or if it is statistically significant.
๐พ The data
The team assembled the following file:
Redesign test data
treatment- "yes" if the user saw the new version of the landing page, no otherwise.new_images- "yes" if the page used a new set of images, no otherwise.converted- 1 if the user joined the site, 0 otherwise.
The control group is those users with "no" in both columns: the old version with the old set of images.
Primer on A/B testing
A/B testing is a randomized experiment with two variants A and B. It includes the application of statistical hypothesis testing (two-sample inference).
A/B testing is commonly used to test new products or new features. The main principle is to split users into two groups: control and treatment. Then, we evaluate how users respond and decide which version is better.
For our case, we are testing whether the new version of the landing page, and the new set of images is worth adding and is actually improving conversion.
An important step of an A/B testing is to formulate the hypothesis, it's commonly done as follows
- Null hypothesis : assumes that the treatments are equal and any difference between the control and experiment groups is due to chance.
- Alternative hypothesis assumes that the null hypothesis is wrong and the outcomes of control and experiment groups are more different than what chance might produce.
I'm Performing this test in R Programming Language.
Install dependencies
install.packages("kableExtra")
install.packages("glue")
# install.packages("ggthemr") # Commented out because 'ggthemr' is not available on CRAN
# Instead, you can install 'ggthemr' from GitHub using the devtools package
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
devtools::install_github("cttobin/ggthemr")Load Libraries
library(tidyverse)
library(kableExtra)
library(glue)
library(ggthemr)
ggthemr("flat", type="outer", layout="scientific", spacing=3)Load Data
df <- readr::read_csv('data/redesign.csv')
head(df)Analyzing conversion rate for each four groups
What is conversion rate?
Conversion rate is defined as the total number of conversions divided by the number of visitors.
We can either calculate it manually with R, or use the prop.table function in R.
We'll also be calculating the uplift for our conversion rates. Lift is calculated as the percent increase/decrease in each metric for users who received a new campaign versus a control group.
Let's start with the design AB conversion rate
Analysis of old and new design
Conversion count on design bar plot
df %>%
ggplot(aes(x = factor(treatment), fill = factor(converted))) +
geom_bar() +
geom_text(
stat = "count",
aes(label = format(after_stat(count), big.mark = ",")),
position = position_stack(vjust = .5),
color="grey20",
size=3.3
) +
labs(x = "treatment", fill = "converted", title = "Conversion count on design")Instead of a bar plot, we can also use a contingency table.
Contingency table of design
prop <- table(df$converted, df$treatment)
prop_sum <- addmargins(prop, 1)
rownames(prop_sum) <- c("no conversion", "conversion", "sum")
colnames(prop_sum) <- c("old design", "new design")
prop_sumFrom the bar plot and table, we can tell that the conversion count increased from the old to new design.
To make this more concrete, let's calculate the conversion rate.
To calculate the conversion rates, we can use the prop.table function
prop.table(prop_sum[1:2, ], 2)The conversion rates are the bottom two values. We could also calculate them manually like below.
# manual calculation
conv_old_design <- (2223 / (2223 + 18019)) * 100
conv_new_design <- (2366 / (2366 + 17876)) * 100
relative_uplift <- (conv_new_design - conv_old_design)/ conv_old_design
glue("Conversion rate for old design is {round(conv_old_design, 2)}%
Conversion rate for new design is {round(conv_new_design, 2)}%
From these conversion rate: the relative uplift is {round(relative_uplift, 4)}%")Analysis on new images
Conversion count on new images bar plot
โ
โ