Skip to content

You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.

The Data

You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.

workout.csv

ColumnDescription
'month'Month when the data was measured.
'workout_worldwide'Index representing the popularity of the keyword 'workout', on a scale of 0 to 100.

three_keywords.csv

ColumnDescription
'month'Month when the data was measured.
'home_workout_worldwide'Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100.
'gym_workout_worldwide'Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100.
'home_gym_worldwide'Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100.

workout_geo.csv

ColumnDescription
'country'Country where the data was measured.
'workout_2018_2023'Index representing the popularity of the keyword 'workout' during the 5 year period.

three_keywords_geo.csv

ColumnDescription
'country'Country where the data was measured.
'home_workout_2018_2023'Index representing the popularity of the keyword 'home workout' during the 5 year period.
'gym_workout_2018_2023'Index representing the popularity of the keyword 'gym workout' during the 5 year period.
'home_gym_2018_2023'Index representing the popularity of the keyword 'home gym' during the 5 year period.

Importing neccesary libraries

# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Reading the Workout Worldwide csv file

#Reading the workout file and storing it on workout variable
ww_workout = pd.read_csv('data/workout.csv')
ww_workout.head()

Creating a plot to visualize the trend of the count of the word "Workout" over the years

# Extract the year directly from the 'month' column and create a new 'year' column
ww_workout['year'] = ww_workout['month'].str[:4]

# Group by year and summarize the workout_worldwide
ww_workout_byYear = ww_workout.groupby('year')['workout_worldwide'].sum().reset_index()

# Plot a chart using matplotlib
plt.figure(figsize=(6, 4))
plt.plot(ww_workout_byYear['year'], ww_workout_byYear['workout_worldwide'], marker='o')
plt.title('"Workout" word count by Year')
plt.xlabel('Year')
plt.ylabel('Workout Worldwide')
plt.show()

As we can see in the chart, the peak was in 2020. We can verify this statement in the next lines of code:

ind_maxRow = ww_workout['workout_worldwide'].idxmax()
maxRow = ww_workout.iloc[ind_maxRow]
year_str = maxRow['month'][:4]
year_str

Since the peak occurred in 2020, we can perform a detailed analysis of that year to identify any trends in the data.

ww_workout_2020 = ww_workout[ww_workout['month'].str[:4] == '2020']
plt.figure(figsize=(12,4))
plt.plot(ww_workout_2020['month'], ww_workout_2020['workout_worldwide'])
plt.title('"Workout" word count on 2020 by month')
plt.xlabel('month-year')
plt.ylabel('word count')
plt.show()

The chart illustrates the count of the word "Workout" throughout the year 2020. Notably, there is a significant peak in April, which is just one month after the onset of the Covid-19 pandemic. This surge likely reflects the increased interest in workouts as people adapted to lockdowns and social distancing measures. Following this peak, there is a general downward trend in the word count, indicating a gradual decline in interest or adaptation to new routines. By October, the count of the word "Workout" had decreased to levels almost equivalent to those in March, suggesting a stabilization or normalization of workout-related discussions. In November, the reduction of word count reached its lowest point, and it started to recover in December.

Of the keywords available, what was the most popular during the covid pandemic, and what is the most popular now? Save your answers as variables called peak_covid and current respectively.

pop_word = pd.read_csv('data/three_keywords.csv')
pop_word.sort_values(by='month')
pop_word
Hidden output