Skip to content
Project: Data-Driven Product Management: Conducting a Market Analysis
You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.
The Data
You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.
workout.csv
| Column | Description |
|---|---|
'month' | Month when the data was measured. |
'workout_worldwide' | Index representing the popularity of the keyword 'workout', on a scale of 0 to 100. |
three_keywords.csv
| Column | Description |
|---|---|
'month' | Month when the data was measured. |
'home_workout_worldwide' | Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100. |
'gym_workout_worldwide' | Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100. |
'home_gym_worldwide' | Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100. |
workout_geo.csv
| Column | Description |
|---|---|
'country' | Country where the data was measured. |
'workout_2018_2023' | Index representing the popularity of the keyword 'workout' during the 5 year period. |
three_keywords_geo.csv
| Column | Description |
|---|---|
'country' | Country where the data was measured. |
'home_workout_2018_2023' | Index representing the popularity of the keyword 'home workout' during the 5 year period. |
'gym_workout_2018_2023' | Index representing the popularity of the keyword 'gym workout' during the 5 year period. |
'home_gym_2018_2023' | Index representing the popularity of the keyword 'home gym' during the 5 year period. |
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt# Start coding here
df = pd.read_csv("data/workout.csv")
df_sorted = df.sort_values("workout_worldwide", ascending = False)
df_sorted["month_dt"] = pd.to_datetime(df_sorted["month"])
df_sorted["year"] = df_sorted["month_dt"].dt.year
year_str = str(df_sorted["year"].iloc[0])
print(year_str)df_key = pd.read_csv("data/three_keywords.csv")
# Pandemic start: March 2020 (WHO), "End": May 2023 (WHO declares end to public health emergency of intl. concern)
df_key["date"] = pd.to_datetime(df_key["month"])
df_covid = df_key[(df_key["date"] >= "2020-03-01") & (df_key["date"] <= "2023-05-01")]
df_key.head()
peak_covid_test = df_covid[["home_workout_worldwide","gym_workout_worldwide", "home_gym_worldwide"]].max()
peak_covid_test_sorted = peak_covid_test.sort_values(ascending = False)
print(peak_covid_test_sorted)
peak_covid = "home workout"
#Data only until 2023, therefore "current" to be considered 2023, even though overlapp with covid timeline
df_current = df_key[df_key["date"] >= "2023-01-01"]
peak_current_test = df_current[["home_workout_worldwide","gym_workout_worldwide", "home_gym_worldwide"]].max()
peak_current_test_sorted = peak_current_test.sort_values(ascending = False)
print(peak_current_test_sorted)
current = "gym workout"df_country = pd.read_csv("data/workout_geo.csv")
df_country_subset = df_country[(df_country["country"] == "United States") | (df_country["country"] == "Australia") | (df_country["country"] == "Japan")]
print(df_country_subset.sort_values("workout_2018_2023", ascending = False))
top_country = "United States"df_home = pd.read_csv("data/three_keywords_geo.csv")
df_home_subset = df_home[(df_home["Country"] == "Philippines") | (df_home["Country"] == "Malaysia")]
print(df_home_subset.groupby("Country")["home_workout_2018_2023"].max())
home_workout_geo = "Philippines"