Skip to content
(Python) Project: Understanding Subscription Behaviors
  • AI Chat
  • Code
  • Report
  • Understanding Subscription Behaviors

    Analyze subscription data for a SaaS company to identify what drives customer subscription renewals.

    A SaaS company seeks to uncover what drives its clients to renew subscriptions. They’ve collected data on client details, subscription records, and economic indicators and would like to connect them to better understand its clients’ behavior.

    They’ve tasked you with analyzing these datasets to identify the key factors influencing clients’ decisions to renew their subscriptions.

    Your analysis will provide them with insights into which customers are renewing their products and the reasons behind their renewals. The company can leverage these insights to make informed decisions to increase renewal rates and improve customer loyalty, helping them stay competitive and ensure long-term growth.

    The Data

    The company have provided you with three datasets for your analysis. A summary of each data is provided below.

    client_details.csv

    ColumnDescription
    client_idUnique identifier for each client.
    company_sizeSize of the company (Small, Medium, Large).
    industryIndustry to which the client belongs (Fintech, Gaming, Crypto, AI, E-commerce).
    locationLocation of the client (New York, New Jersey, Pennsylvania, Massachusetts, Connecticut).

    subscription_records.csv

    ColumnDescription
    client_idUnique identifier for each client.
    subscription_typeType of subscription (Yearly, Monthly).
    start_dateStart date of the subscription - YYYY-MM-DD.
    end_dateEnd date of the subscription - YYYY-MM-DD.
    renewedIndicates whether the subscription was renewed (True, False).

    economic_indicators.csv

    ColumnDescription
    start_dateStart date of the economic indicator (Quarterly) - YYYY-MM-DD.
    end_dateEnd date of the economic indicator (Quarterly) - YYYY-MM-DD.
    inflation_rateInflation rate in the period.
    gdp_growth_rateGross Domestic Product (GDP) growth rate in the period.

    A SaaS company seeks to uncover what drives its clients to renew their subscriptions. Combine and analyze the company's data to provide insights into its customer behavior and trends, and answer the following questions:

    • How many total Fintech and Crypto clients does the company have? Store as an integer variable called total_fintech_crypto_clients
    • Which industry has the highest renewal rate? Store as a string variable called top_industry
    • For clients that renewed their subscriptions, what was the average inflation rate when their subscriptions were renewed? Store as a float variable called average_inflation_for_renewals

    1. Count the number of Fintech and Crypto clients

    Count the number of clients with an industry of Fintech or Crypto. Consider using a custom function that checks whether an industry category matches Fintech or Crypto.

    Mark which clients are in the Fintech or Crypto industries

    • The client_details table will be helpful for this task
    • You can use an if/else statement to check if a string is the desired industry type
    • You can use a boolean to mark if clients are or are not in the Fintech or Crypto industries

    Define a function

    • It may be helpful to define a function that labels whether a client's industry is Fintech or Crypto
    • You could apply this function to the columns directly, or iterate through the industry column and count the clients in the Fintech or Crypto industries

    Count the number of Fintech and Crypto clients

    • Store the total number of Fintech and Crypto clients an integer variable called total_fintech_crypto_clients
    # Import required libraries
    import pandas as pd
    
    # Import data
    client_details = pd.read_csv('data/client_details.csv')
    print('client_details', client_details.shape)
    
    subscription_records = pd.read_csv('data/subscription_records.csv', parse_dates=['start_date', 'end_date'])
    economic_indicators = pd.read_csv('data/economic_indicators.csv', parse_dates=['start_date', 'end_date'])
    subscription_df = pd.read_csv('data/subscription_records.csv')
    display(subscription_df.head())
    
    economic_df = pd.read_csv('data/economic_indicators.csv')
    display(economic_df.head())
    display(client_details.head())
    display(subscription_records.head())
    display(economic_indicators.head())
    # client details is 'crypto'
    
    client_details[client_details['industry']=='Crypto'].count()
    # Answer Q1
    
    # Define a function that returns 1 if the input is either 'Fintech' or 'Crypto', otherwise returning 0
    
    def is_fintech_or_crypto(x):
        if x in ['Fintech','Crypto']:
            return 1
        else:
            return 0
        
    # Loop through the 'industry' column in client_details and increment the total_fintech_crypto_clients counter for every Fintech or Crpyto client
    total_fintech_crypto_clients = 0
    for industry in client_details['industry']:
        total_fintech_crypto_clients += is_fintech_or_crypto(industry)
    # Alternate approach 1 - Apply the custom function directly to the 'industry' column to calculate the total number of Fintech and Crypto clients
    total_fintech_crypto_clients = client_details['industry'].apply(is_fintech_or_crypto).sum()
    total_fintech_crypto_clients
    # Alternate approach 2 - Use a lambda function to calculate the total number of Fintech and Crypto clients
    total_fintech_crypto_clients_2 = client_details['industry'].apply(lambda x: x in ['Fintech','Crypto']).sum()    
    total_fintech_crypto_clients_2
    print('client_details', client_details.shape, '\n')
    print('subscription_records', subscription_records.shape, '\n')
    print('economic_indicators', economic_indicators.shape)

    2. Find the industry with the highest renewal rate

    You’ll need to join subscription and client data in order to find the renewal rate of each industry.

    Merge the client and subscription data

    • The pd.merge() function can be used to join two datasets
    • The client_id is a common key across data
    • Keep in mind the type of join, which is determined by the how argument in pd.merge()

    Find the average renewal rate by industry

    • You can use .groupby() to group data on a single column
    • You can apply the .mean() function to a grouped object
    • You can group by the industry and take the .mean() of the renewed column to find the renewal rate of each industry

    Sort the industry renewal rates to find the industry with the highest renewal rate

    • You can sort the renewal rates by industry in descending order using .sort_values(ascending=False)
    • Store the result as a string variable called top_industry
    top_industry = pd.merge(subscription_records, client_details, on='client_id', how='left')
    top_industry.head()
    industry_renewal_rates = top_industry.groupby('industry')['renewed'].mean()
    industry_renewal_rates