A SaaS company seeks to uncover what drives its clients to renew subscriptions. They’ve collected data on client details, subscription records, and economic indicators and would like to connect them to better understand its clients’ behavior.
They’ve tasked you with analyzing these datasets to identify the key factors influencing clients’ decisions to renew their subscriptions.
Your analysis will provide them with insights into which customers are renewing their products and the reasons behind their renewals. The company can leverage these insights to make informed decisions to increase renewal rates and improve customer loyalty, helping them stay competitive and ensure long-term growth.
The Data
The company have provided you with three datasets for your analysis. A summary of each data is provided below.
client_details.csv
client_details.csv
Column | Description |
---|---|
client_id | Unique identifier for each client. |
company_size | Size of the company (Small, Medium, Large). |
industry | Industry to which the client belongs (Fintech, Gaming, Crypto, AI, E-commerce). |
location | Location of the client (New York, New Jersey, Pennsylvania, Massachusetts, Connecticut). |
subscription_records.csv
subscription_records.csv
Column | Description |
---|---|
client_id | Unique identifier for each client. |
subscription_type | Type of subscription (Yearly, Monthly). |
start_date | Start date of the subscription - YYYY-MM-DD. |
end_date | End date of the subscription - YYYY-MM-DD. |
renewed | Indicates whether the subscription was renewed (True, False). |
economic_indicators.csv
economic_indicators.csv
Column | Description |
---|---|
start_date | Start date of the economic indicator (Quarterly) - YYYY-MM-DD. |
end_date | End date of the economic indicator (Quarterly) - YYYY-MM-DD. |
inflation_rate | Inflation rate in the period. |
gdp_growth_rate | Gross Domestic Product (GDP) growth rate in the period. |
Import Libraries and Datasets
# import required libraries
import pandas as pd
# import data
client_details = pd.read_csv('data/client_details.csv')
subscription_records = pd.read_csv('data/subscription_records.csv', parse_dates = ['start_date','end_date'])
economic_indicators = pd.read_csv('data/economic_indicators.csv', parse_dates = ['start_date','end_date'])
# show the first rows of clients_details df
client_details.head()
Q1. How many total Fintech and Crypto clients does the company have? (use a function to count)
# define function to count clients in industries
def num_clients_in_industries(df, industries):
"""
Counts the number of clients in the specified industries.
Parameters:
df (pd.DataFrame): DataFrame containing client details.
industries (list): List of industries to count clients in.
Returns:
int: Number of clients in the specified industries.
"""
count = 0
for industry in industries:
count += df[df['industry'] == industry].shape[0]
return count
# list of industries to count clients
industries_to_count = ['Fintech', 'Crypto']
# count the number of clients in fintech and crypto
total_fintech_crypto_clients = num_clients_in_industries(client_details, industries_to_count)
print(f'The total number of clients in {" & ".join(industries_to_count)} industries is {total_fintech_crypto_clients}')
Q2. Which industry has the highest renewal rate?
# join client_details with subscription_records
merged_client_subscription_df = pd.merge(client_details, subscription_records, on='client_id', how='inner')
# calculate the average renewal rate by industry
avg_renewal_rate_by_industry = merged_client_subscription_df.groupby('industry').renewed.agg('mean').reset_index()
avg_renewal_rate_by_industry.columns = ['industry', 'average_renewal_rate']
# sort avg_renewal_rate_by_industry desc
sorted_avg_renewal_rate_by_industry = avg_renewal_rate_by_industry.sort_values('average_renewal_rate', ascending=False)
# get the top of sorted_avg_renewal_rate_by_industry
top_industry = sorted_avg_renewal_rate_by_industry.iloc[0][0]
print(f'The industry with the highest renewal rate is {top_industry}')
Q3. For clients that renewed their subscriptions, what was the average inflation rate when their subscriptions were renewed?
# sort subscription_records by end_date
subscription_records = subscription_records.sort_values('end_date')
# sort economic_indicators by start_date
economic_indicators = economic_indicators.sort_values('start_date')
# join subscription_records with economic_indicators
merged_subsc_economic_df = pd.merge_asof(subscription_records, economic_indicators, left_on='end_date', right_on='start_date', suffixes=('_sub','_econ'))
# filter merged_subsc_economic_df to keep only renewed
renewed_subsc_df = merged_subsc_economic_df[merged_subsc_economic_df['renewed'] == True]
# calculate average inflation_rate for renewed_subsc_df
average_inflation_for_renewals = renewed_subsc_df['inflation_rate'].mean().round(2)
print(f'The average inflation rate for renewals is {average_inflation_for_renewals}')