Skip to content

You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.

The Data

You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.

workout.csv

ColumnDescription
'month'Month when the data was measured.
'workout_worldwide'Index representing the popularity of the keyword 'workout', on a scale of 0 to 100.

three_keywords.csv

ColumnDescription
'month'Month when the data was measured.
'home_workout_worldwide'Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100.
'gym_workout_worldwide'Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100.
'home_gym_worldwide'Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100.

workout_geo.csv

ColumnDescription
'country'Country where the data was measured.
'workout_2018_2023'Index representing the popularity of the keyword 'workout' during the 5 year period.

three_keywords_geo.csv

ColumnDescription
'country'Country where the data was measured.
'home_workout_2018_2023'Index representing the popularity of the keyword 'home workout' during the 5 year period.
'gym_workout_2018_2023'Index representing the popularity of the keyword 'gym workout' during the 5 year period.
'home_gym_2018_2023'Index representing the popularity of the keyword 'home gym' during the 5 year period.
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt



# challenge 1 : global search workout at its peak
workout_global = pd.read_csv('data/workout.csv')
# need to find the max 
max_workout_global = workout_global.loc[workout_global['workout_worldwide'] == workout_global['workout_worldwide'].max()]
year_str = str(pd.to_datetime(max_workout_global['month']).dt.year.iloc[0])

#challenge 1 among keywords available, most popular during the covid and and what is the most popular now
keywords_available = pd.read_csv('data/three_keywords.csv')
period_COVID = keywords_available[(keywords_available['month'] >= '2019-01') & (keywords_available['month'] <= '2020-12')]

# sum of the rates 
rates = ['home_workout_worldwide', 'gym_workout_worldwide', 'home_gym_worldwide']
# Maybe I need to mke a sum for this period 
sum_rates_covid = period_COVID[rates].sum()
peak_covid = sum_rates_covid.idxmax()

# current_period
current_period = keywords_available[keywords_available['month'] >= '2023-01']
sum_rates_current = current_period[rates].sum()
current = sum_rates_current.idxmax()


# challenge 3 : country with highest interest for workouts 
workout_geo = pd.read_csv('data/workout_geo.csv')
list_countries = ['United States', 'Australia', 'Japan']
countries_filters = workout_geo[workout_geo['country'].isin(list_countries)]
workout_per_country = countries_filters.groupby('country')['workout_2018_2023'].sum()
top_country = workout_per_country.idxmax()

#challenge 4 : Philippines or Malaysia : which one has the highest interest in home workout
three_keywords = pd.read_csv('data/three_keywords_geo.csv')
countries_to_work = three_keywords.loc[three_keywords['Country'].isin(['Philippines', 'Malaysia'])]
filters = countries_to_work.groupby('Country')['home_workout_2018_2023'].sum()
filter_idx_max = filters.idxmax()
home_workout_geo = str(countries_to_work.loc[countries_to_work['Country']==filter_idx_max, 'Country'].iloc[0])
print(home_workout_geo)