Skip to content
0

Which version of the website should you use?

๐Ÿ“– Background

You work for an early-stage startup in Germany. Your team has been working on a redesign of the landing page. The team believes a new design will increase the number of people who click through and join your site.

They have been testing the changes for a few weeks and now they want to measure the impact of the change and need you to determine if the increase can be due to random chance or if it is statistically significant.

๐Ÿ’พ The data

The team assembled the following file:

Redesign test data
  • "treatment" - "yes" if the user saw the new version of the landing page, no otherwise.
  • "new_images" - "yes" if the page used a new set of images, no otherwise.
  • "converted" - 1 if the user joined the site, 0 otherwise.

The control group is those users with "no" in both columns: the old version with the old set of images.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from statsmodels.stats.proportion import proportions_ztest

Let's import the data and look at few rows.

df = pd.read_csv('./data/redesign.csv')
df.head()

Data Exploration & Preparation

First, let's check for missing data.

df.isnull().any()

To simplify the analysis, let's convert, the 'Yes' in 1 and the 'No' in 0, in the columns 'treatment' and 'new_images'.

df['treatment'] = df['treatment'].map(dict(yes=1, no=0))
df['new_images'] = df['new_images'].map(dict(yes=1, no=0))
df.head()
df.describe()

Split into four groups :

  • Ctl_group : corresponding to the old version of the website (no treatment, no new images)
  • A : corresponding to new version of the landing page without new images
  • B : corresponding to new images without new version of the landing page
  • C : correponding to new version of the landing page and new images
conditions = [
    (df['treatment'] == 0) & (df['new_images'] == 0), 
    (df['treatment'] == 1) & (df['new_images'] == 0), 
    (df['treatment'] == 0) & (df['new_images'] == 1), 
    (df['treatment'] == 1) & (df['new_images'] == 1)
]

values = ['Control_Group', 'A', 'B', 'C']

df['groups'] = np.select(conditions, values)

1. Analyse the conversion rates for each of the 4 groups

df.groupby('groups')['converted'].mean()

colors = ['darkgray', 'lightskyblue', 'lightskyblue', 'lightskyblue']

sns.barplot(data=df, x='groups', y='converted', order=values, palette=colors)
โ€Œ
โ€Œ
โ€Œ