Skip to content

Introduction to A/B Testing in R

Today, we want to analyze a fictional dataset spanning seven months from July to January, where we conducted an A/B test in October.

The dataset is a fictional dataset which has been created in data_creation.ipynb

library(tidyverse)
# Not directly available from Workspace, locally you should be able to use install.packages("lmtest") and install.packages("sandwich")
install.packages("lmtest_0.9-40.tar.gz", repos = NULL, type = "source")
install.packages("sandwich_3.0-2.tar.gz", repos = NULL, type = "source")
#
library(lmtest)
library(sandwich)

Tasks 1

Load the experiment_data.csv via read_csv and look at some random rows.

Month shows the time dimension ranging from July 2022 to January 2023.

Group indicates whether a customer is in the treatment group or not

Treated is always 0 for the control (Existing) group as well as for the A group before October (prior to implementing the experiment).

Dollars are the $ spent by our customers

id is a personal identifier of the customers

# Load data and look  10 random rows
customer_data <- read_csv("experiment_data.csv")
customer_data[sample(1:nrow(customer_data), size = 15), ]

Task 2

Look at the customer_data to see the number of customers we observe per month in each group. How many individual customers are there? Look at the Treated column by Month

cat("Rows and unique rows in the dataset:\n")
dim(customer_data)
nrow(distinct(customer_data)) # assure no duplicated rows
cat("\nUnique/distinct months in the dataset:\n")
nrow(distinct(customer_data, Month)) # Time frame
cat("\nUnique/distinct customers in the dataset:\n")
nrow(distinct(customer_data, id)) # Number of clients

cat("\nClients seeing the new and old feature:\n")
table(customer_data$Group) # Number of clients
cat("\n")
table(customer_data$Month, customer_data$Treated)

Task 3

Aggregate the whole dataset by Month and Group and look at the Dollars spent with a line plot.

month_group_data <- customer_data %>%  group_by(Month, Group) %>%  summarize(Dollars = mean(Dollars), Treatment = mean(Treated))
month_group_data %>% arrange(Month, Group)
# Plot the time series using a line plot
time_series_plot = ggplot(month_group_data,
       aes(x = as.factor(Month),
           y = Dollars,
           color = Group,
           group = Group)) +  geom_point(size = 3) +  geom_line(linewidth = 1.3) + geom_vline(xintercept = 4) 
time_series_plot
# Drop October (because some in the 'New' group already saw the new product others still the old one)
customer_data = customer_data %>% filter(Month != "202210")
# Add a binary to indicate the actual A/B testing period
customer_data$AB_period = ifelse(customer_data$Month %in% c("202211", "202212", "202301"), 1, 0)
#
table(customer_data$Month, customer_data$AB_period)

Task 4

Plot the Dollars spent by Group in the actual A/B time period.

ggplot(customer_data %>% filter(AB_period == 1), aes(Dollars, fill = Group)) +
     geom_density(alpha = 0.1)

Task 5

Plot again the Dollars spent by Group in the actual A/B time period. This time, however, on a new dataset where we averaged the individual Dollars spent (by period) to avoid having multiple observations by the same customer during the same period.

# Now aggregate on the customer-level that we get one row for each customer before and after seeing the "New" product
customer_data_aggregated = customer_data %>%  group_by(id, Treated, Group, AB_period) %>%  summarize(Dollars = mean(Dollars))
customer_data_aggregated = customer_data_aggregated %>% arrange(id, Treated, Group)
head(customer_data_aggregated)
tail(customer_data_aggregated)

Task 6

Now let's compare the Dollars spent between New vs. Existing Group in the actual A/B testing period.