You are a product manager for a fitness studio based in Singapore and are interested in understanding the types of digital products you should offer. You plan to conduct a market analysis in Python to understand how to place your digital fitness products in the regional market. A market analysis will allow you to identify strengths of your competitors, gauge demand, and create unique new digital products and services for potential users.
You are provided with a number of CSV files in the Files-"data" folder, which offer international data on Google Trends and YouTube keyword searches related to fitness and related products. Two helper functions have also been provided, read_file and read_geo, to help you process and visualize these CSV files for further analysis.
You'll use pandas methods to explore this data and drive your product management insights.
You can continue beyond the bounds of this project and also investigate in-person classes, local gyms, and online personal instructors!
# STARTER CODE - PLEASE DO NOT EDIT ANY CODE IN THIS CELL
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style='white', palette='Pastel2')
import os
def read_file(filepath, plot = True):
"""
Read a CSV file from a given filepath, convert it into a pandas DataFrame,
and return a processed DataFrame with three columns: 'week', 'region', and 'interest'. Generate a line plot using Seaborn to visualize the data. This corresponds to the first graphic (time series) returned by trends.google.com.
"""
file = pd.read_csv(filepath, header=1)
df = file.set_index('Week').stack().reset_index()
df.columns = ['week','region','interest']
df['week'] = pd.to_datetime(df['week'])
plt.figure(figsize=(8,3))
df = df[df['interest']!="<1"]
df['interest'] = df['interest'].astype(float)
if plot:
sns.lineplot(data = df, x= 'week', y= 'interest',hue='region')
return df
def read_geo(filepath, multi=False):
"""
Read a CSV file from a given filepath, convert it into a pandas DataFrame,
and return a processed DataFrame with two columns: 'country' and 'interest'. Generate a bar plot using Seaborn to visualize the data. This corresponds to the second graphic returned by trends.google.com. Use multi=False if only one keyword is being analyzed, and multi=True if more than one keyword is being analyzed.
"""
file = pd.read_csv(filepath, header=1)
if not multi:
file.columns = ['country', 'interest']
plt.figure(figsize=(8,4))
sns.barplot(data = file.dropna().iloc[:25,:], y = 'country', x='interest')
if multi:
plt.figure(figsize=(3,8))
file = file.set_index('Country').stack().reset_index()
file.columns = ['country','category','interest']
file['interest'] = pd.to_numeric(file['interest'].apply(lambda x: x[:-1]))
sns.barplot(data=file.dropna(), y = 'country', x='interest', hue='category')
file = file.sort_values(ascending=False,by='interest')
return file# Start your coding here ....Parte 1
workout=read_file('data/workout.csv')Parte 2
#Enfoque diferente solamente tomando el mes con el dt.to_period
#lleva la fecha a periodo
#workout['month'] = workout['week'].dt.to_period('M')
#Se agrupa y se calcula la media
#workout_by_month = workout.groupby('month')['interest'].mean()
#se obtiene el mes mas alto ya como str
#max_month = workout_by_month.idxmax()
#month_str = max_month.strftime('%Y-%m-%d')
#month_str
#enfoque que toma el resample para MS month start no tomando en cuenta el dia
workout_by_month = workout.set_index('week').resample('MS').mean()
month_high = workout_by_month[workout_by_month['interest']==workout_by_month['interest'].max()]
month_str = str(month_high.index[0].date())
month_strkeyword=read_file('data/three_keywords.csv')Parte 3
#filtro para el año
filter = keyword['week'].between('2022-01-01T00:00:00.000', '2023-12-31T00:00:00.000')
keyword_filter=keyword[filter]#Agrupacion y division con split para que solo tome una parte del current
current = keyword_filter.groupby('region')['interest'].mean().idxmax()
partes= current.split(':')
current= partes[0].strip()
current
#se hace el mismo codigo de arriba pero el filtro de fecha es solo un año
filter_covid = keyword['week'].between('2020-01-01T00:00:00.000', '2020-12-31T00:00:00.000')
keyword_covid=keyword[filter_covid]peak_covid_grupo=keyword_covid.groupby('region')['interest'].max().idxmax()
partes_covid=peak_covid_grupo.split(':')
peak_covid=partes_covid[0].strip()
peak_covidParte 4