You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.
The Data
You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.
workout.csv
| Column | Description |
|---|---|
'month' | Month when the data was measured. |
'workout_worldwide' | Index representing the popularity of the keyword 'workout', on a scale of 0 to 100. |
three_keywords.csv
| Column | Description |
|---|---|
'month' | Month when the data was measured. |
'home_workout_worldwide' | Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100. |
'gym_workout_worldwide' | Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100. |
'home_gym_worldwide' | Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100. |
workout_geo.csv
| Column | Description |
|---|---|
'country' | Country where the data was measured. |
'workout_2018_2023' | Index representing the popularity of the keyword 'workout' during the 5 year period. |
three_keywords_geo.csv
| Column | Description |
|---|---|
'country' | Country where the data was measured. |
'home_workout_2018_2023' | Index representing the popularity of the keyword 'home workout' during the 5 year period. |
'gym_workout_2018_2023' | Index representing the popularity of the keyword 'gym workout' during the 5 year period. |
'home_gym_2018_2023' | Index representing the popularity of the keyword 'home gym' during the 5 year period. |
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt# Start coding here
#importing data
workout_df = pd.read_csv('data/workout.csv')
threek_df = pd.read_csv('data/three_keywords.csv')
workout_geo_df = pd.read_csv('data/workout_geo.csv')
threek_geo_df = pd.read_csv('data/three_keywords_geo.csv')#Q1 When was the global search for 'workout' at its peak? Save the year of peak interest as a string named year_str in the format "yyyy".
#look at the data
print(workout_df.head())
#find the highest workout_worldwide index
print(workout_df['workout_worldwide'].max())
#visualise the trend
plt.plot(workout_df['month'], workout_df['workout_worldwide'])
plt.show()
#extract the value of month where workout_worldwide = 100
row = workout_df[workout_df['workout_worldwide'] == workout_df['workout_worldwide'].max()]
print(row)
#get the value of month col of df row with the row index 25
y_m = row.month[25]
#check type :: string
type(y_m)
#store the year of peak interest
year_str = y_m[0:4]
print(year_str)#Q2 Of the keywords available, what was the most popular during the covid pandemic, and what is the most popular now? Save your answers as variables called peak_covid and current respectively.
#find the most popular during the pandemic: peak_covid
#look at the data
print(workout_df.head())
print(threek_df.head())
print(threek_geo_df.head())
#visualise the graph of threek_df.csv
plt.plot(threek_df['month'],threek_df['home_workout_worldwide'], color = 'r', label = 'home workout')
plt.plot(threek_df['month'],threek_df['gym_workout_worldwide'], color = 'b', label = 'gym workout')
plt.plot(threek_df['month'],threek_df['home_gym_worldwide'], color = 'g', label = 'home gym')
#add label on axis
plt.xlabel("month")
plt.ylabel("popularity index")
plt.legend()
#display
plt.show()
#using subplot to plot graphs next to each other
plt.subplot(2,2,1)
plt.plot(threek_df['month'],threek_df['home_workout_worldwide'], color = 'r', label = 'home workout')
plt.legend()
plt.subplot(2,2,2)
plt.plot(threek_df['month'],threek_df['gym_workout_worldwide'], color = 'b', label = 'gym workout')
plt.legend()
plt.subplot(2,2,3)
plt.plot(threek_df['month'],threek_df['home_gym_worldwide'], color = 'g', label = 'home gym')
plt.legend()
plt.show()
#the most popular during the pandemic is home workout
peak_covid = 'home workout'
#the most popular now is gym workout
current = 'gym workout'
Q3. What country has the highest interest for workouts among the following: United States, Australia, or Japan? Save your answer as top_country.
#explore df_workout_geo as importing with using col 'country' as index
df_workout_geo = pd.read_csv("data/workout_geo.csv", index_col = 0)
#retrieve data using country as index
print(df_workout_geo.loc["United States"])
print(df_workout_geo.loc["Australia"])
print(df_workout_geo.loc["Japan"])
top_country = "United States"Q4.You'd be interested in expanding your virtual home workouts offering to either the Philippines or Malaysia. Which of the two countries has the highest interest in home workouts? Identify the country and save it as home_workout_geo.
#importing the threek_keywords_geo using country as index
threek_geo_df = pd.read_csv('data/three_keywords_geo.csv', index_col = 0)
#filtering using .loc
phil = threek_geo_df.loc["Philippines", "home_workout_2018_2023"]
print(phil)
malay = threek_geo_df.loc["Malaysia", "home_workout_2018_2023"]
print(malay)
#the philipllines has higher interest at 52
home_workout_geo = "Philippines"