Skip to content
New Workbook
Sign up
Project: Data-Driven Product Management: Conducting a Market Analysis

You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.

The Data

You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.

workout.csv

ColumnDescription
'month'Month when the data was measured.
'workout_worldwide'Index represeting the popularity of the keyword 'workout', on a scale of 0 to 100.

three_keywords.csv

ColumnDescription
'month'Month when the data was measured.
'home_workout_worldwide'Index represeting the popularity of the keyword 'home workout', on a scale of 0 to 100.
'gym_workout_worldwide'Index represeting the popularity of the keyword 'gym workout', on a scale of 0 to 100.
'home_gym_worldwide'Index represeting the popularity of the keyword 'home gym', on a scale of 0 to 100.

workout_geo.csv

ColumnDescription
'country'Country where the data was measured.
'workout_2018_2023'Index represeting the popularity of the keyword 'workout' during the 5 year period.

three_keywords_geo.csv

ColumnDescription
'country'Country where the data was measured.
'home_workout_2018_2023'Index represeting the popularity of the keyword 'home workout' during the 5 year period.
'gym_workout_2018_2023'Index represeting the popularity of the keyword 'gym workout' during the 5 year period.
'home_gym_2018_2023'Index represeting the popularity of the keyword 'home gym' during the 5 year period.
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
workout = pd.read_csv("data/workout.csv")

#The global search for 'workout' at its peak
workout.plot(x = "month", y = "workout_worldwide" , kind = "line")
plt.show()
year_str = "2020"
# year_str = str(workout.loc[workout["workout_worldwide"].idxmax() , "month"].split('-')[0])
# print(year_str)

# The most popular during the covid pandemic, and what is the most popular now
covid_vs_now = pd.read_csv("data/three_keywords.csv")
covid_vs_now.plot(kind = "line")
plt.show()
peak_covid = "home_workout_worldwide"
current = "gym_workout_worldwide"

# The highest interest for workouts among the following: United States, Australia, or Japan
highest_interest = pd.read_csv("data/workout_geo.csv")
highest_interest[highest_interest["country"].isin(["United States","Australia" ,"Japan"])].plot(kind = "bar" , x = "country" , y = "workout_2018_2023")
plt.show()
top_country = "United States"

# The highest interest in home workouts between Philippines and Malaysia
interest_home_workout = pd.read_csv("data/three_keywords_geo.csv")
highest_interest_home_workout = interest_home_workout.loc[interest_home_workout["Country"].isin(["Philippines" , "Malaysia"]), ["Country", "home_workout_2018_2023"]]
plt.bar(highest_interest_home_workout["Country"], highest_interest_home_workout["home_workout_2018_2023"])
plt.xlabel('Country')
plt.ylabel('Interest Level')
plt.show() 
home_workout_geo = "Philippines"