Understanding Subscription Behaviors
Analyze subscription data for a SaaS company to identify what drives customer subscription renewals.
A SaaS company seeks to uncover what drives its clients to renew subscriptions. They’ve collected data on client details, subscription records, and economic indicators and would like to connect them to better understand its clients’ behavior.
They’ve tasked you with analyzing these datasets to identify the key factors influencing clients’ decisions to renew their subscriptions.
Your analysis will provide them with insights into which customers are renewing their products and the reasons behind their renewals. The company can leverage these insights to make informed decisions to increase renewal rates and improve customer loyalty, helping them stay competitive and ensure long-term growth.
The Data
The company have provided you with three datasets for your analysis. A summary of each data is provided below.
client_details.csv
client_details.csv
Column | Description |
---|---|
client_id | Unique identifier for each client. |
company_size | Size of the company (Small, Medium, Large). |
industry | Industry to which the client belongs (Fintech, Gaming, Crypto, AI, E-commerce). |
location | Location of the client (New York, New Jersey, Pennsylvania, Massachusetts, Connecticut). |
subscription_records.csv
subscription_records.csv
Column | Description |
---|---|
client_id | Unique identifier for each client. |
subscription_type | Type of subscription (Yearly, Monthly). |
start_date | Start date of the subscription - YYYY-MM-DD. |
end_date | End date of the subscription - YYYY-MM-DD. |
renewed | Indicates whether the subscription was renewed (True, False). |
economic_indicators.csv
economic_indicators.csv
Column | Description |
---|---|
start_date | Start date of the economic indicator (Quarterly) - YYYY-MM-DD. |
end_date | End date of the economic indicator (Quarterly) - YYYY-MM-DD. |
inflation_rate | Inflation rate in the period. |
gdp_growth_rate | Gross Domestic Product (GDP) growth rate in the period. |
A SaaS company seeks to uncover what drives its clients to renew their subscriptions. Combine and analyze the company's data to provide insights into its customer behavior and trends, and answer the following questions:
- How many total Fintech and Crypto clients does the company have? Store as an integer variable called
total_fintech_crypto_clients
- Which industry has the highest renewal rate? Store as a string variable called
top_industry
- For clients that renewed their subscriptions, what was the average inflation rate when their subscriptions were renewed? Store as a float variable called
average_inflation_for_renewals
1. Count the number of Fintech and Crypto clients
Count the number of clients with an industry
of Fintech
or Crypto
. Consider using a custom function that checks whether an industry category matches Fintech
or Crypto
.
Mark which clients are in the Fintech or Crypto industries
- The
client_details
table will be helpful for this task - You can use an if/else statement to check if a string is the desired industry type
- You can use a boolean to mark if clients are or are not in the
Fintech
orCrypto
industries
Define a function
- It may be helpful to define a function that labels whether a client's industry is
Fintech
orCrypto
- You could apply this function to the columns directly, or iterate through the
industry
column and count the clients in theFintech
orCrypto
industries
Count the number of Fintech and Crypto clients
- Store the total number of
Fintech
andCrypto
clients an integer variable calledtotal_fintech_crypto_clients
# Import required libraries
import pandas as pd
# Import data
client_details = pd.read_csv('data/client_details.csv')
print('client_details', client_details.shape)
subscription_records = pd.read_csv('data/subscription_records.csv', parse_dates=['start_date', 'end_date'])
economic_indicators = pd.read_csv('data/economic_indicators.csv', parse_dates=['start_date', 'end_date'])
subscription_df = pd.read_csv('data/subscription_records.csv')
display(subscription_df.head())
economic_df = pd.read_csv('data/economic_indicators.csv')
display(economic_df.head())
display(client_details.head())
display(subscription_records.head())
display(economic_indicators.head())
# client details is 'crypto'
client_details[client_details['industry']=='Crypto'].count()
# Answer Q1
# Define a function that returns 1 if the input is either 'Fintech' or 'Crypto', otherwise returning 0
def is_fintech_or_crypto(x):
if x in ['Fintech','Crypto']:
return 1
else:
return 0
# Loop through the 'industry' column in client_details and increment the total_fintech_crypto_clients counter for every Fintech or Crpyto client
total_fintech_crypto_clients = 0
for industry in client_details['industry']:
total_fintech_crypto_clients += is_fintech_or_crypto(industry)
# Alternate approach 1 - Apply the custom function directly to the 'industry' column to calculate the total number of Fintech and Crypto clients
total_fintech_crypto_clients = client_details['industry'].apply(is_fintech_or_crypto).sum()
total_fintech_crypto_clients
# Alternate approach 2 - Use a lambda function to calculate the total number of Fintech and Crypto clients
total_fintech_crypto_clients_2 = client_details['industry'].apply(lambda x: x in ['Fintech','Crypto']).sum()
total_fintech_crypto_clients_2
print('client_details', client_details.shape, '\n')
print('subscription_records', subscription_records.shape, '\n')
print('economic_indicators', economic_indicators.shape)
2. Find the industry with the highest renewal rate
You’ll need to join subscription and client data in order to find the renewal rate of each industry.
Merge the client and subscription data
- The
pd.merge()
function can be used to join two datasets - The
client_id
is a common key across data - Keep in mind the type of join, which is determined by the
how
argument inpd.merge()
Find the average renewal rate by industry
- You can use
.groupby()
to group data on a single column - You can apply the
.mean()
function to a grouped object - You can group by the
industry
and take the.mean()
of therenewed
column to find the renewal rate of each industry
Sort the industry renewal rates to find the industry with the highest renewal rate
- You can sort the renewal rates by industry in descending order using
.sort_values(ascending=False)
- Store the result as a string variable called
top_industry
top_industry = pd.merge(subscription_records, client_details, on='client_id', how='left')
top_industry.head()
industry_renewal_rates = top_industry.groupby('industry')['renewed'].mean()
industry_renewal_rates