Project: Data-Driven Product Management: Conducting a Market Analysis

You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.

The Data

You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.

workout.csv

Column	Description
`'month'`	Month when the data was measured.
`'workout_worldwide'`	Index representing the popularity of the keyword 'workout', on a scale of 0 to 100.

three_keywords.csv

Column	Description
`'month'`	Month when the data was measured.
`'home_workout_worldwide'`	Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100.
`'gym_workout_worldwide'`	Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100.
`'home_gym_worldwide'`	Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100.

workout_geo.csv

Column	Description
`'country'`	Country where the data was measured.
`'workout_2018_2023'`	Index representing the popularity of the keyword 'workout' during the 5 year period.

three_keywords_geo.csv

Column	Description
`'country'`	Country where the data was measured.
`'home_workout_2018_2023'`	Index representing the popularity of the keyword 'home workout' during the 5 year period.
`'gym_workout_2018_2023'`	Index representing the popularity of the keyword 'gym workout' during the 5 year period.
`'home_gym_2018_2023'`	Index representing the popularity of the keyword 'home gym' during the 5 year period.

# Start coding here
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

wk = pd.read_csv("data/workout.csv")
tk = pd.read_csv('data/three_keywords.csv')
wg = pd.read_csv('data/workout_geo.csv')
tg = pd.read_csv('data/three_keywords_geo.csv')

wk['year'] = wk['month'].str[:4]
sum_wk  = wk.groupby('year')['workout_worldwide'].sum()
plt.plot(sum_wk)
plt.ylabel('Total searches for (workout)')
plt.xlabel('Year of searching')
plt.show()
year_str = '2020' #Question number 1 DONE

tk['year'] = tk['month'].str[:4]
sum_tk = tk.groupby('year')['home_workout_worldwide','gym_workout_worldwide', 'home_gym_worldwide'].sum()
sum_tk = sum_tk.sort_index()
colors = ['#1f77b4', '#ff7f0e', '#2ca02c'] 
for idx, column in enumerate(sum_tk.columns):
    plt.plot(sum_tk.index, sum_tk[column], label=column, color=colors[idx])
plt.xlabel('Years of searching terms')
plt.ylabel('Number of searches for different terms')
plt.legend(title='Keywords')
plt.show()
peak_covid = 'home_workout_worldwide'
current = 'gym_workout_worldwide' # Question number 2 DONE


ct = wg[wg['country'].isin(['United States', 'Australia', 'Japan'])]
print(ct)
top_country = 'United States' # Question number 3 DONE


tt = tg[tg['Country'].isin(['Philippines', 'Malaysia'])]
print(tt)
home_workout_geo = 'Philippines' #Question number 4 DONE