Skip to content
New Workbook
Sign up
A/B Testing in R

Introduction to A/B Testing in R

Today, we want to analyze a fictional dataset spanning seven months from July to January, where we conducted an A/B test in October.

You can consult the solution for this live training in notebook-solution.ipynb.

The dataset is a fictional dataset which has been created in data_creation.ipynb

library(tidyverse)
# Not directly available from Workspace, locally you should be able to use install.packages("lmtest") and install.packages("sandwich")
install.packages("lmtest_0.9-40.tar.gz", repos = NULL, type = "source")
install.packages("sandwich_3.0-2.tar.gz", repos = NULL, type = "source")
#
library(lmtest)
library(sandwich)

Tasks 1

Load the experiment_data.csv via read_csv and look at some random rows.

Month shows the time dimension ranging from July 2022 to January 2023.

Group indicates whether a customer is in the treatment group or not

Treated is always 0 for the control (Existing) group as well as for the A group before October (prior to implementing the experiment).

Dollars are the $ spent by our customers

id is a personal identifier of the customers

library(tidyverse)
df <- read_csv("experiment_data.csv")
df[sample(1:nrow(df), 20), ]
df$Treated <- as.factor(df$Treated)
#df$Month <- as.factor(df$Month)
summary(df)

Task 2

Look at the customer_data to see the number of customers we observe per month in each group. How many individual customers are there? Look at the Treated column by Month

df %>% group_by(Month) %>% count()
cat("Rows and unique rows in the dataset:\n")
nrow(df)
nrow(unique(df))
cat("\nUnique/distinct months in the dataset:\n")
table(df$Month)
cat("\nUnique/distinct customers in the dataset:\n")
length(unique(df$id))

cat("\nNumber of clients by group ('New' vs 'Existing'):\n")
df %>% group_by(Group) %>% summarise(n = n())
cat("\n")

table(df$Month, df$Treated)
df %>% group_by(id) %>% summarise(n = n(), mean = mean(Dollars)) %>% arrange(desc(n)) 

Task 3

Aggregate the whole dataset by Month and Group and look at the Dollars spent with a line plot.

month_group_data <- df %>%  group_by(Month, Group) %>%  summarize(Dollars = mean(Dollars), Treatment = mean(Treated))
month_group_data %>% arrange(Month, Group)
ts_plot <- ggplot(month_group_data, aes(x = as.factor(Month), y = Dollars, color = Group, group = Group)) 
ts_plot + geom_line() + geom_point(size = 3)
df %>% group_by(Month, Group) %>% ggplot(aes(x = as.factor(Month), y = Dollars, color = Group))  + geom_boxplot()
# Drop October (because some in the 'New' group already saw the new product others still the old one)
customer_data = df %>% filter(Month != "202210")
# Add a binary to indicate the actual A/B testing period
customer_data$AB_period = ifelse(customer_data$Month %in% c("202211", "202212", "202301"), 1, 0)
#
table(customer_data$Month, customer_data$AB_period)