Project: Data-Driven Product Management: Conducting a Market Analysis — DataLab

Skip to content

You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.

The Data

You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.

workout.csv

Column	Description
`'month'`	Month when the data was measured.
`'workout_worldwide'`	Index representing the popularity of the keyword 'workout', on a scale of 0 to 100.

three_keywords.csv

Column	Description
`'month'`	Month when the data was measured.
`'home_workout_worldwide'`	Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100.
`'gym_workout_worldwide'`	Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100.
`'home_gym_worldwide'`	Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100.

workout_geo.csv

Column	Description
`'country'`	Country where the data was measured.
`'workout_2018_2023'`	Index representing the popularity of the keyword 'workout' during the 5 year period.

three_keywords_geo.csv

Column	Description
`'country'`	Country where the data was measured.
`'home_workout_2018_2023'`	Index representing the popularity of the keyword 'home workout' during the 5 year period.
`'gym_workout_2018_2023'`	Index representing the popularity of the keyword 'gym workout' during the 5 year period.
`'home_gym_2018_2023'`	Index representing the popularity of the keyword 'home gym' during the 5 year period.

# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

workout = pd.read_csv('data/workout.csv')
workout.head()
print(workout.info())
print(workout.describe())
print(workout["month"].max())

#When was the global search for 'workout' at its peak?
workout_max = workout.sort_values(by="workout_worldwide",ascending=False).head(5)
sns.catplot(x="month",y="workout_worldwide",data=workout_max,kind="bar")
plt.xticks(rotation=90)
plt.show()
year_str1 = 2020

#By finding year using program
max_row = workout.loc[workout["workout_worldwide"].idxmax()] #..idxmax() to find the index of the max value.
month_value = max_row['month']
print(month_value)

year_str = month_value.split('-')[0] #Splits the string at the hyphen (-), producing a list: ["2020", "04"],[0]: takes the first element of the list, which is "yyyy".

print(year_str)

#What was the most popular during the covid pandemic, and what is the most popular now?

keywords = pd.read_csv('data/three_keywords.csv')
print(keywords.head())
print(keywords.info())
print(keywords.describe())

keywords_long = keywords.melt(id_vars=['month'],value_vars=['home_workout_worldwide','gym_workout_worldwide','home_gym_worldwide'],var_name ='keyword', value_name='interest') #convert to long format using column name as keyword and values as interest

sns.relplot(x="month",y="interest" ,data=keywords_long,kind="line",hue="keyword")
plt.show()

peak_covid = "home_workout_worldwide"
current = "gym_workout_worldwide"

#What country has the highest interest for workouts among the following: United States, Australia, or Japan? 

workout_location = pd.read_csv('data/workout_geo.csv')
print(workout_location.head())
print(workout_location.info())

workout_location = workout_location.sort_values(by="workout_2018_2023",ascending = False)
workout_location = workout_location[workout_location["country"].isin(["United States","Australia","Japan"])].head()
print(workout_location)

top_country = "United States"

#You'd be interested in expanding your virtual home workouts offering to either the Philippines or Malaysia. Which of the two countries has the highest interest in home workouts?

expand_workout = pd.read_csv('data/three_keywords_geo.csv')
print(expand_workout.head())

expand_workout = expand_workout.sort_values(by='home_workout_2018_2023',ascending=False)
expand_workout = expand_workout[expand_workout['Country'].isin(["Philippines","Malaysia"])]

print(expand_workout[["Country","home_workout_2018_2023"]])

home_workout_geo = "Philippines"