Skip to content
E-Commerce Churn Analysis
  • AI Chat
  • Code
  • Report
  • Non-Contactual Churn Analysis

    For this Churn Analysis, I will be using the BetaGeoFitter from the lifetimes library. This model predicts customer activity using their past transactions, mainly how often they have placed orders, and how long it has been since their last transactions. For example, if a customer places an order on average every 20 days, and their last transaction was 5 days ago, their probability of being active will be very high, however, if it's been 80 days since their last transaction, it will be quite low.

    Import Data and Libraries

    %%capture
    !pip install lifetimes #For DataCamp Workspace Only
    
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    from matplotlib.pyplot import figure
    from lifetimes import BetaGeoFitter
    from lifetimes.utils import calibration_and_holdout_data
    from lifetimes.utils import summary_data_from_transaction_data
    from lifetimes.plotting import plot_frequency_recency_matrix
    from lifetimes.plotting import plot_probability_alive_matrix
    from lifetimes.plotting import plot_period_transactions
    from lifetimes.plotting import plot_history_alive
    from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases
    import warnings
    
    warnings.filterwarnings('ignore')

    Source of dataset.

    Citation: Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).

    data = pd.read_csv("online_retail.csv")

    Creating a copy to retain original data if necessary

    df = data.copy()
    df

    For the model, we need 4 columns: Order ID, Customer ID, Date, and Sales Amount. Below code gets those 4 columns and drops all others.

    df['SalesAmount'] = df['UnitPrice'] * df['Quantity']
    
    df1 = df.drop(columns = ['StockCode', 'Description', 'Country', 'UnitPrice', 'Quantity'])
    
    df1 = df1.groupby(by = ['InvoiceNo', 'CustomerID', 'InvoiceDate'], as_index = False).sum()
    
    df1
    df1.sort_values(by = 'InvoiceDate', ascending = False )

    RFM Metrics

    df_rfmt = summary_data_from_transaction_data(df, 
                                             'CustomerID', 
                                             'InvoiceDate', 
                                             'SalesAmount',
                                             observation_period_end='2011-09-09')
    
    df_rfmt

    Distribution of RFM Metrics

    ax = sns.distplot(df_rfmt['recency'])