Project: Data-Driven Product Management: Conducting a Market Analysis

You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.

The Data

You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.

workout.csv

Column	Description
`'month'`	Month when the data was measured.
`'workout_worldwide'`	Index representing the popularity of the keyword 'workout', on a scale of 0 to 100.

three_keywords.csv

Column	Description
`'month'`	Month when the data was measured.
`'home_workout_worldwide'`	Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100.
`'gym_workout_worldwide'`	Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100.
`'home_gym_worldwide'`	Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100.

workout_geo.csv

Column	Description
`'country'`	Country where the data was measured.
`'workout_2018_2023'`	Index representing the popularity of the keyword 'workout' during the 5 year period.

three_keywords_geo.csv

Column	Description
`'country'`	Country where the data was measured.
`'home_workout_2018_2023'`	Index representing the popularity of the keyword 'home workout' during the 5 year period.
`'gym_workout_2018_2023'`	Index representing the popularity of the keyword 'gym workout' during the 5 year period.
`'home_gym_2018_2023'`	Index representing the popularity of the keyword 'home gym' during the 5 year period.

# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

# Start coding here
df = pd.read_csv("data/workout.csv")

df_sorted = df.sort_values("workout_worldwide", ascending = False)

df_sorted["month_dt"] = pd.to_datetime(df_sorted["month"])
df_sorted["year"] = df_sorted["month_dt"].dt.year

year_str = str(df_sorted["year"].iloc[0])
print(year_str)

df_key = pd.read_csv("data/three_keywords.csv")


# Pandemic start: March 2020 (WHO), "End": May 2023 (WHO declares end to public health emergency of intl. concern)

df_key["date"] = pd.to_datetime(df_key["month"])

df_covid = df_key[(df_key["date"] >= "2020-03-01") & (df_key["date"] <= "2023-05-01")]
df_key.head()
peak_covid_test = df_covid[["home_workout_worldwide","gym_workout_worldwide", "home_gym_worldwide"]].max()
peak_covid_test_sorted = peak_covid_test.sort_values(ascending = False)
print(peak_covid_test_sorted)

peak_covid = "home workout"

#Data only until 2023, therefore "current" to be considered 2023, even though overlapp with covid timeline

df_current = df_key[df_key["date"] >= "2023-01-01"]
peak_current_test = df_current[["home_workout_worldwide","gym_workout_worldwide", "home_gym_worldwide"]].max()
peak_current_test_sorted = peak_current_test.sort_values(ascending = False)
print(peak_current_test_sorted)

current = "gym workout"

df_country = pd.read_csv("data/workout_geo.csv")
df_country_subset = df_country[(df_country["country"] == "United States") | (df_country["country"] == "Australia") | (df_country["country"] == "Japan")]

print(df_country_subset.sort_values("workout_2018_2023", ascending = False))

top_country = "United States"