Skip to content

1. Designing the Experiment

In order to arrive to a randomness-free conclusion, the following must be set:

  • Null Hypothesis: The conversion rate of (Var 1: Var 2: Var 3) is the same as the conversion rate of the control group
  • Alternative Hypothesis: The conversion rate of (Var 1: Var 2: Var 3) is better or worse than the conversion rate of the control group
  • Confidence level: 95%; Alpha: 0.05
  • Hypothesis Test: Two-Tailed Proportion
  • Approach: P-value

2. Preparing the Data

Before analysing the data, the following steps must be done:

  • Load the necessary imports
  • Read the given data
  • Describe the data
  • Check for inconsistent data
#Loading the necessary imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

#Reading the csv file as dataframe
df = pd.read_csv('./data/redesign.csv')
df.head(n=5)
#print(df.tail()); print(df.info()); print(df.describe())
print(df.info()); print(df.describe())
pd.crosstab(df['treatment'],df['new_images'])

Upon checking the dataframe, the following information were observed:

  • The dataframe consists of three columns namely: treatment, new_images, and converted
  • Both the values of treatment and new_images columns are of datatype: object
  • The values of the converted column is made up of binary data of datatype: int64
  • The total count of the dataframe is 40,484 observations
  • The crosstab shows that the variations and the control group have equal numbers of observations: 10,121

Before analysing the data, it must be neat and clean. Errors must be checked before manipulating the data.

print(df['treatment'].value_counts())
print(df['new_images'].value_counts())
print(df['converted'].value_counts())

The data looks neat and clean! Next step is to prepare the data for further analysis.

From the initial inspection of the given data, it requires segregation for each variation and the control group. Once done, the data will be more orderly and organized, and easier to process for A/B Testing.

df['variation'] = 0

for row in df.index:
    if df.iloc[row,0] == 'yes' and df.iloc[row,1] == 'yes':
        df.iloc[row, 3] = 'Var 1'
    elif df.iloc[row,0] == 'yes' and df.iloc[row,1] == 'no':
        df.iloc[row, 3] = 'Var 2'
    elif df.iloc[row,0] == 'no' and df.iloc[row,1] == 'yes':
        df.iloc[row, 3] = 'Var 3'
    else:
        df.iloc[row, 3] = 'Control'
           
print(df.head())

What was done is the creation of a new column named 'variation' which contains the subgroups as values defined:

  • Var 1 - these are observations containing the data from users who have seen the new set of images and the new design of the landing page.
  • Var 2 - these are observations containing the data from users who have seen the old set of images and the new design of the landing page.
  • Var 3 - these are observations containing the data from users who have seen the new set of images and the old design of the landing page.
  • Control - these are observations containing the data from users who have seen the old set of images and the old design of the landing page. This category is set as the control group.

In order to simplify the data, a subset of the modified dataframe consisting of the columns 'variation' and 'converted' must be created. The data will be split into 4 subgroups.

#Subsetting variation and converted columns
simplified_df = df[['variation','converted']]

#Splitting subset into 4 subgroups
var1_df = simplified_df[simplified_df['variation'] == 'Var 1']
var2_df = simplified_df[simplified_df['variation'] == 'Var 2']
var3_df = simplified_df[simplified_df['variation'] == 'Var 3']
control_df = simplified_df[simplified_df['variation'] == 'Control']

var1_df.head()
var2_df.head()
var3_df.head()