A/B testing in R

Introduction to A/B Testing in R

Today, we want to analyze a fictional dataset spanning seven months from July to January, where we conducted an A/B test in October.

The dataset is a fictional dataset which has been created in data_creation.ipynb

library(tidyverse)
# Not directly available from Workspace, locally you should be able to use install.packages("lmtest") and install.packages("sandwich")
install.packages("lmtest_0.9-40.tar.gz", repos = NULL, type = "source")
install.packages("sandwich_3.0-2.tar.gz", repos = NULL, type = "source")
#
library(lmtest)
library(sandwich)

Tasks 1

Load the experiment_data.csv via read_csv and look at some random rows.

Month shows the time dimension ranging from July 2022 to January 2023.

Group indicates whether a customer is in the treatment group or not

Treated is always 0 for the control (Existing) group as well as for the A group before October (prior to implementing the experiment).

Dollars are the $ spent by our customers

id is a personal identifier of the customers

Task 2

Look at the customer_data to see the number of customers we observe per month in each group. How many individual customers are there? Look at the Treated column by Month

cat("Rows and unique rows in the dataset:\n")

cat("\nUnique/distinct months in the dataset:\n")

cat("\nUnique/distinct customers in the dataset:\n")


cat("\nNumber of clients by group ('New' vs 'Existing'):\n")

cat("\n")

Task 3

Aggregate the whole dataset by Month and Group and look at the Dollars spent with a line plot.

if(FALSE) {
month_group_data <- customer_data %>%  group_by(Month, Group) %>%  summarize(Dollars = mean(Dollars), Treatment = mean(Treated))
month_group_data %>% arrange(Month, Group)
}

if(FALSE) {
# Drop October (because some in the 'New' group already saw the new product others still the old one)
customer_data = customer_data %>% filter(Month != "202210")
# Add a binary to indicate the actual A/B testing period
customer_data$AB_period = ifelse(customer_data$Month %in% c("202211", "202212", "202301"), 1, 0)
#
table(customer_data$Month, customer_data$AB_period)
}

Task 4

Plot the Dollars spent by Group in the actual A/B time period.

Task 5

Plot again the Dollars spent by Group in the actual A/B time period. This time, however, on a new dataset where we averaged the individual Dollars spent (by period) to avoid having multiple observations by the same customer during the same period.

if(FALSE) {
# Now aggregate on the customer-level that we get one row for each customer before and after seeing the "New" product
customer_data_aggregated = customer_data %>%  group_by(id, Treated, Group, AB_period) %>%  summarize(Dollars = mean(Dollars))
customer_data_aggregated = customer_data_aggregated %>% arrange(id, Treated, Group)
head(customer_data_aggregated)
tail(customer_data_aggregated)
}

Task 6

Now let's compare the Dollars spent between New vs. Existing Group in the actual A/B testing period.

Task 7

But we could also compare only New before and after implementing the A/B test. Let's do that!!

Task 8

Calculate the standard deviation of the Dollars spent in A/B period of the New group and use power.t.test() to calculate the necessary sample size to get statistical significant results on the p = 0.05 signficiance level assuming power = 0.8 (and equal variances).

#round(sd(customer_data_aggregated %>% filter(Group == "New" & AB_period == 1) %>% pull(Dollars)), 1)

‌
‌
‌