Skip to content

You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.

The Data

You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.

workout.csv

ColumnDescription
'month'Month when the data was measured.
'workout_worldwide'Index representing the popularity of the keyword 'workout', on a scale of 0 to 100.

three_keywords.csv

ColumnDescription
'month'Month when the data was measured.
'home_workout_worldwide'Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100.
'gym_workout_worldwide'Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100.
'home_gym_worldwide'Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100.

workout_geo.csv

ColumnDescription
'country'Country where the data was measured.
'workout_2018_2023'Index representing the popularity of the keyword 'workout' during the 5 year period.

three_keywords_geo.csv

ColumnDescription
'country'Country where the data was measured.
'home_workout_2018_2023'Index representing the popularity of the keyword 'home workout' during the 5 year period.
'gym_workout_2018_2023'Index representing the popularity of the keyword 'gym workout' during the 5 year period.
'home_gym_2018_2023'Index representing the popularity of the keyword 'home gym' during the 5 year period.
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

# Load the global workout search trends data
workout_df = pd.read_csv("data/workout.csv")

# Display the first few rows to understand the structure
workout_df.head()
# Convert the month column to datetime format
workout_df["month"] = pd.to_datetime(workout_df["month"])

# Extract the year with the highest search interest
peak_year = workout_df.loc[workout_df["workout_worldwide"].idxmax(), "month"].year

# Save the result as a string in the format "yyyy"
year_str = str(peak_year)
year_str
# Load the keyword trends data
keywords_df = pd.read_csv("data/three_keywords.csv")

# Display the first few rows to understand the structure
keywords_df.head()
# Convert the month column to datetime format
keywords_df["month"] = pd.to_datetime(keywords_df["month"])

# Filter data for the COVID-19 pandemic period (2020)
covid_period = keywords_df[keywords_df["month"].dt.year == 2020]

# Identify the most popular keyword during COVID-19 (2020)
peak_covid = covid_period.iloc[:, 1:].sum().idxmax()

# Identify the most popular keyword in the latest available data
latest_data = keywords_df.iloc[-1, 1:]
latest_data = latest_data.apply(pd.to_numeric, errors='coerce')  # Convert to numeric, coercing errors to NaN
current = latest_data.idxmax()

peak_covid, current
# Load the workout interest by country data
workout_geo_df = pd.read_csv("data/workout_geo.csv")

# Display the first few rows to understand the structure
workout_geo_df.head()
# List of target countries
target_countries = ["United States", "Australia", "Japan"]

# Filter data for the selected countries
country_interest = workout_geo_df[workout_geo_df["country"].isin(target_countries)]

# Identify the country with the highest interest
top_country = country_interest.loc[country_interest["workout_2018_2023"].idxmax(), "country"]
top_country
# Load the home workout interest by country data
home_workout_geo_df = pd.read_csv("data/three_keywords_geo.csv")

# Display the first few rows to understand the structure
home_workout_geo_df.head()
# List of target countries
target_home_workout_countries = ["Philippines", "Malaysia"]

# Filter data for the selected countries
home_workout_interest = home_workout_geo_df[home_workout_geo_df["Country"].isin(target_home_workout_countries)]

# Identify the country with the highest interest in home workouts
home_workout_geo = home_workout_interest.loc[home_workout_interest["home_workout_2018_2023"].idxmax(), "Country"]
home_workout_geo