Skip to content
A-B testing a new website design
  • AI Chat
  • Code
  • Report
  • Which version of the website should you use?

    📖 Background

    You work for an early-stage startup in Germany. Your team has been working on a redesign of the landing page. The team believes a new design will increase the number of people who click through and join your site.

    They have been testing the changes for a few weeks and now they want to measure the impact of the change and need you to determine if the increase can be due to random chance or if it is statistically significant.

    💾 The data

    The team assembled the following file:

    Redesign test data
    • "treatment" - "yes" if the user saw the new version of the landing page, no otherwise.
    • "new_images" - "yes" if the page used a new set of images, no otherwise.
    • "converted" - 1 if the user joined the site, 0 otherwise.

    The control group is those users with "no" in both columns: the old version with the old set of images.

    💪 Challenge

    Complete the following tasks:

    1. Analyze the conversion rates for each of the four groups: the new/old design of the landing page and the new/old pictures.
    2. Can the increases observed be explained by randomness? (Hint: Think A/B test)
    3. Which version of the website should they use?

    Imports

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    from scipy.stats import chi2_contingency
    from statsmodels.stats.proportion import proportions_ztest
    df = pd.read_csv('./data/redesign.csv')
    df.head()

    Q1. Analyze the conversion rates for each of the four groups: new/old design of the landing page and new/old pictures

    summary = df['converted'].agg({'mean', 'sum', 'count', 'std'}).reset_index()
    summary = summary.transpose()
    summary.columns = ['total_converted', 'total_count', 'conversion_rate', 'stdev']
    summary.drop('index', axis = 0)
    gdf = df.groupby(['treatment', 'new_images'])['converted'].agg({'mean', 'sum', 'count', 'std'}).reset_index().sort_values(ascending = False, by = 'mean')
    gdf
    gdf = df.groupby(['treatment', 'new_images'])['converted'].agg({'mean', 'sum', 'count', 'std'}).reset_index()
    gdf.columns = ['treatment', 'new_images', 'std', 'total_count', 'conversion_rate', 'total_converted']
    gdf['treatment'] = gdf['treatment'].apply(lambda x: x.upper())
    gdf['new_images'] = gdf['new_images'].apply(lambda x: x.upper())
    gdf['group'] = gdf['treatment'] + ' treatment' + ' | ' + gdf['new_images'] + ' new_images '
    gdf = gdf.sort_values(by = 'conversion_rate', ascending = False)
    gdf
    plt.figure(figsize = ( 16, 5 ))
    sns.barplot(gdf, x = 'group', y = 'conversion_rate')
    
    plt.ylabel('conversion Rate')
    plt.tight_layout()

    Q2. Can increases observed be explained by randomness?

    Chi-Square Test

    def chi_squared_test(group1, group2):
        contingency_table = [
            [group1['total_converted'], group1['total_count'] - group1['total_converted']],
            [group2['total_converted'], group2['total_count'] - group2['total_converted']]
        ]
    
        chi2, p, _, _ = chi2_contingency(contingency_table)
        return chi2, p
    
    labels = [
        ("YES treatment, NO new_images", "Control Group"),
        ("YES treatment, YES new_images", "Control Group"),
        ("NO treatment, YES new_images", "Control Group"),
    ]
    
    groups_for_comparison = [
        (gdf.iloc[0], gdf.iloc[3]),
        (gdf.iloc[1], gdf.iloc[3]),
        (gdf.iloc[2], gdf.iloc[3]),
    ]
    
    test_results = []
    for (label1, label2), (group1, group2) in zip(labels, groups_for_comparison):
        chi2, p = chi_squared_test(group1, group2)
        result = {
            'Comparison': f"{label1} vs. {label2}",
            'Chi-Squared': chi2,
            'P-Value': p
        }
        test_results.append(result)
    
    pd.DataFrame(test_results).sort_values("P-Value")