Skip to content
Project: Data-Driven Product Management: Conducting a Market Analysis
You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.
The Data
You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.
workout.csv
| Column | Description |
|---|---|
'month' | Month when the data was measured. |
'workout_worldwide' | Index representing the popularity of the keyword 'workout', on a scale of 0 to 100. |
three_keywords.csv
| Column | Description |
|---|---|
'month' | Month when the data was measured. |
'home_workout_worldwide' | Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100. |
'gym_workout_worldwide' | Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100. |
'home_gym_worldwide' | Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100. |
workout_geo.csv
| Column | Description |
|---|---|
'country' | Country where the data was measured. |
'workout_2018_2023' | Index representing the popularity of the keyword 'workout' during the 5 year period. |
three_keywords_geo.csv
| Column | Description |
|---|---|
'country' | Country where the data was measured. |
'home_workout_2018_2023' | Index representing the popularity of the keyword 'home workout' during the 5 year period. |
'gym_workout_2018_2023' | Index representing the popularity of the keyword 'gym workout' during the 5 year period. |
'home_gym_2018_2023' | Index representing the popularity of the keyword 'home gym' during the 5 year period. |
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
workout = pd.read_csv('data/workout.csv')
workout.head()
print(workout.info())
print(workout.describe())
print(workout["month"].max())
#When was the global search for 'workout' at its peak?
workout_max = workout.sort_values(by="workout_worldwide",ascending=False).head(5)
sns.catplot(x="month",y="workout_worldwide",data=workout_max,kind="bar")
plt.xticks(rotation=90)
plt.show()
year_str1 = 2020
#By finding year using program
max_row = workout.loc[workout["workout_worldwide"].idxmax()] #..idxmax() to find the index of the max value.
month_value = max_row['month']
print(month_value)
year_str = month_value.split('-')[0] #Splits the string at the hyphen (-), producing a list: ["2020", "04"],[0]: takes the first element of the list, which is "yyyy".
print(year_str)
#What was the most popular during the covid pandemic, and what is the most popular now?
keywords = pd.read_csv('data/three_keywords.csv')
print(keywords.head())
print(keywords.info())
print(keywords.describe())
keywords_long = keywords.melt(id_vars=['month'],value_vars=['home_workout_worldwide','gym_workout_worldwide','home_gym_worldwide'],var_name ='keyword', value_name='interest') #convert to long format using column name as keyword and values as interest
sns.relplot(x="month",y="interest" ,data=keywords_long,kind="line",hue="keyword")
plt.show()
peak_covid = "home_workout_worldwide"
current = "gym_workout_worldwide"
#What country has the highest interest for workouts among the following: United States, Australia, or Japan?
workout_location = pd.read_csv('data/workout_geo.csv')
print(workout_location.head())
print(workout_location.info())
workout_location = workout_location.sort_values(by="workout_2018_2023",ascending = False)
workout_location = workout_location[workout_location["country"].isin(["United States","Australia","Japan"])].head()
print(workout_location)
top_country = "United States"
#You'd be interested in expanding your virtual home workouts offering to either the Philippines or Malaysia. Which of the two countries has the highest interest in home workouts?
expand_workout = pd.read_csv('data/three_keywords_geo.csv')
print(expand_workout.head())
expand_workout = expand_workout.sort_values(by='home_workout_2018_2023',ascending=False)
expand_workout = expand_workout[expand_workout['Country'].isin(["Philippines","Malaysia"])]
print(expand_workout[["Country","home_workout_2018_2023"]])
home_workout_geo = "Philippines"