Skip to content
Project: Data-Driven Product Management: Conducting a Market Analysis
You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.
The Data
You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.
workout.csv
Column | Description |
---|---|
'month' | Month when the data was measured. |
'workout_worldwide' | Index representing the popularity of the keyword 'workout', on a scale of 0 to 100. |
three_keywords.csv
Column | Description |
---|---|
'month' | Month when the data was measured. |
'home_workout_worldwide' | Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100. |
'gym_workout_worldwide' | Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100. |
'home_gym_worldwide' | Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100. |
workout_geo.csv
Column | Description |
---|---|
'country' | Country where the data was measured. |
'workout_2018_2023' | Index representing the popularity of the keyword 'workout' during the 5 year period. |
three_keywords_geo.csv
Column | Description |
---|---|
'country' | Country where the data was measured. |
'home_workout_2018_2023' | Index representing the popularity of the keyword 'home workout' during the 5 year period. |
'gym_workout_2018_2023' | Index representing the popularity of the keyword 'gym workout' during the 5 year period. |
'home_gym_2018_2023' | Index representing the popularity of the keyword 'home gym' during the 5 year period. |
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
# Corrected file paths
workout_df = pd.read_csv("data/workout.csv")
three_keywords_df = pd.read_csv("data/three_keywords.csv")
workout_geo_df = pd.read_csv("data/workout_geo.csv")
three_keywords_geo_df = pd.read_csv("data/three_keywords_geo.csv")
peak_index = workout_df["workout_worldwide"].idxmax()
year_str = workout_df.loc[peak_index,'month'][:4]
print(f"1) The global search for workout was {year_str}.")
three_keywords_df['month'] = pd.to_datetime(three_keywords_df['month'])
covid_period = three_keywords_df[three_keywords_df['month'].dt.year == 2020]
latest_month = three_keywords_df['month'].max()
latest_data = three_keywords_df[three_keywords_df['month'] == latest_month]
peak_covid = covid_period[["home_workout_worldwide","gym_workout_worldwide","home_gym_worldwide"]].mean().idxmax()
current = latest_data[["home_workout_worldwide","gym_workout_worldwide","home_gym_worldwide"]].mean().idxmax()
print(f'2) The most popular keyword during covid pandemic was "{peak_covid}. Currently, the most popular keyword is {current}."')
countries = ["United States","Australia","Japan"]
countries_interest = workout_geo_df[workout_geo_df["country"].isin(countries)]
top_country = countries_interest.loc[countries_interest['workout_2018_2023'].idxmax(), 'country']
print(f"3) The highest interest for workout among United Stated, Australia or Japan is for {top_country}.")
countries_asia = ["Philippines","Malaysia"]
countries_filteres = three_keywords_geo_df[three_keywords_geo_df["Country"].isin(countries_asia)]
most_popular_keyword_asia = countries_filteres[['home_workout_2018_2023', 'gym_workout_2018_2023', 'home_gym_2018_2023']].mean().idxmax()
home_workout_geo = countries_filteres.loc[countries_filteres[most_popular_keyword_asia].idxmax(), "Country"]
print(f"4) The country with the highest interest in {most_popular_keyword_asia} is: {home_workout_geo}.")