Introduction to A/B Testing in R
Today, we want to analyze a fictional dataset spanning seven months from July to January, where we conducted an A/B test in October.
You can consult the solution for this live training in notebook-solution.ipynb
.
The dataset is a fictional dataset which has been created in data_creation.ipynb
library(tidyverse)
# Not directly available from Workspace, locally you should be able to use install.packages("lmtest") and install.packages("sandwich")
install.packages("lmtest_0.9-40.tar.gz", repos = NULL, type = "source")
install.packages("sandwich_3.0-2.tar.gz", repos = NULL, type = "source")
#
library(lmtest)
library(sandwich)
Tasks 1
Load the experiment_data.csv via read_csv
and look at some random rows.
Month
shows the time dimension ranging from July 2022 to January 2023.
Group
indicates whether a customer is in the treatment group or not
Treated
is always 0 for the control (Existing) group as well as for the A group before October (prior to implementing the experiment).
Dollars
are the $ spent by our customers
id
is a personal identifier of the customers
library(tidyverse)
df <- read_csv("experiment_data.csv")
df[sample(1:nrow(df), 20), ]
df$Treated <- as.factor(df$Treated)
#df$Month <- as.factor(df$Month)
summary(df)
Task 2
Look at the customer_data to see the number of customers we observe per month in each group. How many individual customers are there?
Look at the Treated
column by Month
df %>% group_by(Month) %>% count()
cat("Rows and unique rows in the dataset:\n")
nrow(df)
nrow(unique(df))
cat("\nUnique/distinct months in the dataset:\n")
table(df$Month)
cat("\nUnique/distinct customers in the dataset:\n")
length(unique(df$id))
cat("\nNumber of clients by group ('New' vs 'Existing'):\n")
df %>% group_by(Group) %>% summarise(n = n())
cat("\n")
table(df$Month, df$Treated)
df %>% group_by(id) %>% summarise(n = n(), mean = mean(Dollars)) %>% arrange(desc(n))
Task 3
Aggregate the whole dataset by Month
and Group
and look at the Dollars
spent with a line plot.
month_group_data <- df %>% group_by(Month, Group) %>% summarize(Dollars = mean(Dollars), Treatment = mean(Treated))
month_group_data %>% arrange(Month, Group)
ts_plot <- ggplot(month_group_data, aes(x = as.factor(Month), y = Dollars, color = Group, group = Group))
ts_plot + geom_line() + geom_point(size = 3)
df %>% group_by(Month, Group) %>% ggplot(aes(x = as.factor(Month), y = Dollars, color = Group)) + geom_boxplot()
# Drop October (because some in the 'New' group already saw the new product others still the old one)
customer_data = df %>% filter(Month != "202210")
# Add a binary to indicate the actual A/B testing period
customer_data$AB_period = ifelse(customer_data$Month %in% c("202211", "202212", "202301"), 1, 0)
#
table(customer_data$Month, customer_data$AB_period)