Skip to content
Project: Data-Driven Product Management: Conducting a Market Analysis
You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.
The Data
You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.
workout.csv
| Column | Description |
|---|---|
'month' | Month when the data was measured. |
'workout_worldwide' | Index representing the popularity of the keyword 'workout', on a scale of 0 to 100. |
three_keywords.csv
| Column | Description |
|---|---|
'month' | Month when the data was measured. |
'home_workout_worldwide' | Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100. |
'gym_workout_worldwide' | Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100. |
'home_gym_worldwide' | Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100. |
workout_geo.csv
| Column | Description |
|---|---|
'country' | Country where the data was measured. |
'workout_2018_2023' | Index representing the popularity of the keyword 'workout' during the 5 year period. |
three_keywords_geo.csv
| Column | Description |
|---|---|
'country' | Country where the data was measured. |
'home_workout_2018_2023' | Index representing the popularity of the keyword 'home workout' during the 5 year period. |
'gym_workout_2018_2023' | Index representing the popularity of the keyword 'gym workout' during the 5 year period. |
'home_gym_2018_2023' | Index representing the popularity of the keyword 'home gym' during the 5 year period. |
# Start coding here
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
wk = pd.read_csv("data/workout.csv")
tk = pd.read_csv('data/three_keywords.csv')
wg = pd.read_csv('data/workout_geo.csv')
tg = pd.read_csv('data/three_keywords_geo.csv')
wk['year'] = wk['month'].str[:4]
sum_wk = wk.groupby('year')['workout_worldwide'].sum()
plt.plot(sum_wk)
plt.ylabel('Total searches for (workout)')
plt.xlabel('Year of searching')
plt.show()
year_str = '2020' #Question number 1 DONE
tk['year'] = tk['month'].str[:4]
sum_tk = tk.groupby('year')['home_workout_worldwide','gym_workout_worldwide', 'home_gym_worldwide'].sum()
sum_tk = sum_tk.sort_index()
colors = ['#1f77b4', '#ff7f0e', '#2ca02c']
for idx, column in enumerate(sum_tk.columns):
plt.plot(sum_tk.index, sum_tk[column], label=column, color=colors[idx])
plt.xlabel('Years of searching terms')
plt.ylabel('Number of searches for different terms')
plt.legend(title='Keywords')
plt.show()
peak_covid = 'home_workout_worldwide'
current = 'gym_workout_worldwide' # Question number 2 DONE
ct = wg[wg['country'].isin(['United States', 'Australia', 'Japan'])]
print(ct)
top_country = 'United States' # Question number 3 DONE
tt = tg[tg['Country'].isin(['Philippines', 'Malaysia'])]
print(tt)
home_workout_geo = 'Philippines' #Question number 4 DONE
print(wk)
print(wg)
print(tk)
print(tg)