## Introduction to A/B Testing in R

Today, we want to analyze a fictional dataset spanning seven months from July to January, where we conducted an A/B test in October.

The dataset is a fictional dataset which has been created in *data_creation.ipynb*

```
library(tidyverse)
# Not directly available from Workspace, locally you should be able to use install.packages("lmtest") and install.packages("sandwich")
install.packages("lmtest")
install.packages("sandwich")
#
library(lmtest)
library(sandwich)
```

### Tasks 1

Load the **experiment_data.csv** via `read_csv`

and look at some random rows.

`Month`

shows the time dimension ranging from July 2022 to January 2023.

`Group`

indicates whether a customer is in the treatment group or not

`Treated`

is always 0 for the control (*Existing*) group as well as for the A group before October (prior to implementing the experiment).

`Dollars`

are the $ spent by our customers

`id`

is a personal identifier of the customers

```
df <- read_csv("experiment_data.csv")
print(df)
```

### Task 2

Look at the customer_data to see the number of customers we observe per month in each group. How many individual customers are there?
Look at the `Treated`

column by `Month`

```
cat("Rows and unique rows in the dataset:\n")
cat("\nUnique/distinct months in the dataset:\n")
cat("\nUnique/distinct customers in the dataset:\n")
cat("\nNumber of clients by group ('New' vs 'Existing'):\n")
cat("\n")
```

### Task 3

Aggregate the whole dataset by `Month`

and `Group`

and look at the `Dollars`

spent with a line plot.

```
if(FALSE) {
month_group_data <- customer_data %>% group_by(Month, Group) %>% summarize(Dollars = mean(Dollars), Treatment = mean(Treated))
month_group_data %>% arrange(Month, Group)
}
```

```
if(FALSE) {
# Drop October (because some in the 'New' group already saw the new product others still the old one)
customer_data = customer_data %>% filter(Month != "202210")
# Add a binary to indicate the actual A/B testing period
customer_data$AB_period = ifelse(customer_data$Month %in% c("202211", "202212", "202301"), 1, 0)
#
table(customer_data$Month, customer_data$AB_period)
}
```

### Task 4

Plot the `Dollars`

spent by `Group`

in the actual A/B time period.

### Task 5

Plot again the `Dollars`

spent by `Group`

in the actual A/B time period. This time, however, on a new dataset where we averaged the individual Dollars spent (by period) to avoid having multiple observations by the same customer during the same period.

```
if(FALSE) {
# Now aggregate on the customer-level that we get one row for each customer before and after seeing the "New" product
customer_data_aggregated = customer_data %>% group_by(id, Treated, Group, AB_period) %>% summarize(Dollars = mean(Dollars))
customer_data_aggregated = customer_data_aggregated %>% arrange(id, Treated, Group)
head(customer_data_aggregated)
tail(customer_data_aggregated)
}
```

### Task 6

Now let's compare the `Dollars`

spent between `New`

vs. `Existing`

`Group`

in the actual A/B testing period.

### Task 7

But we could also compare only `New`

before and after implementing the A/B test. Let's do that!!

### Task 8

Calculate the standard deviation of the `Dollars`

spent in A/B period of the `New`

group and use `power.t.test()`

to calculate the necessary sample size to get statistical significant results on the `p = 0.05`

signficiance level assuming `power = 0.8`

(and equal variances).