Skip to content
Project: Data-Driven Product Management: Conducting a Market Analysis
You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.
The Data
You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.
workout.csv
| Column | Description |
|---|---|
'month' | Month when the data was measured. |
'workout_worldwide' | Index representing the popularity of the keyword 'workout', on a scale of 0 to 100. |
three_keywords.csv
| Column | Description |
|---|---|
'month' | Month when the data was measured. |
'home_workout_worldwide' | Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100. |
'gym_workout_worldwide' | Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100. |
'home_gym_worldwide' | Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100. |
workout_geo.csv
| Column | Description |
|---|---|
'country' | Country where the data was measured. |
'workout_2018_2023' | Index representing the popularity of the keyword 'workout' during the 5 year period. |
three_keywords_geo.csv
| Column | Description |
|---|---|
'country' | Country where the data was measured. |
'home_workout_2018_2023' | Index representing the popularity of the keyword 'home workout' during the 5 year period. |
'gym_workout_2018_2023' | Index representing the popularity of the keyword 'gym workout' during the 5 year period. |
'home_gym_2018_2023' | Index representing the popularity of the keyword 'home gym' during the 5 year period. |
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
# Load the global workout search trends data
workout_df = pd.read_csv("data/workout.csv")
# Display the first few rows to understand the structure
workout_df.head()# Convert the month column to datetime format
workout_df["month"] = pd.to_datetime(workout_df["month"])
# Extract the year with the highest search interest
peak_year = workout_df.loc[workout_df["workout_worldwide"].idxmax(), "month"].year
# Save the result as a string in the format "yyyy"
year_str = str(peak_year)
year_str
# Load the keyword trends data
keywords_df = pd.read_csv("data/three_keywords.csv")
# Display the first few rows to understand the structure
keywords_df.head()
# Convert the month column to datetime format
keywords_df["month"] = pd.to_datetime(keywords_df["month"])
# Filter data for the COVID-19 pandemic period (2020)
covid_period = keywords_df[keywords_df["month"].dt.year == 2020]
# Identify the most popular keyword during COVID-19 (2020)
peak_covid = covid_period.iloc[:, 1:].sum().idxmax()
# Identify the most popular keyword in the latest available data
latest_data = keywords_df.iloc[-1, 1:]
latest_data = latest_data.apply(pd.to_numeric, errors='coerce') # Convert to numeric, coercing errors to NaN
current = latest_data.idxmax()
peak_covid, current# Load the workout interest by country data
workout_geo_df = pd.read_csv("data/workout_geo.csv")
# Display the first few rows to understand the structure
workout_geo_df.head()
# List of target countries
target_countries = ["United States", "Australia", "Japan"]
# Filter data for the selected countries
country_interest = workout_geo_df[workout_geo_df["country"].isin(target_countries)]
# Identify the country with the highest interest
top_country = country_interest.loc[country_interest["workout_2018_2023"].idxmax(), "country"]
top_country
# Load the home workout interest by country data
home_workout_geo_df = pd.read_csv("data/three_keywords_geo.csv")
# Display the first few rows to understand the structure
home_workout_geo_df.head()
# List of target countries
target_home_workout_countries = ["Philippines", "Malaysia"]
# Filter data for the selected countries
home_workout_interest = home_workout_geo_df[home_workout_geo_df["Country"].isin(target_home_workout_countries)]
# Identify the country with the highest interest in home workouts
home_workout_geo = home_workout_interest.loc[home_workout_interest["home_workout_2018_2023"].idxmax(), "Country"]
home_workout_geo