Project: Data Driven Product Management: Conducting a Market Analysis

You are a product manager for a fitness studio based in Singapore and are interested in understanding the types of digital products you should offer. You already run successful local studios and have an established practice in Singapore. You want to understand the place of digital fitness products in your local market.

You would like to conduct a market analysis in Python to understand how to place your digital product in the regional market and what else is currently out there.

A market analysis will allow you to achieve several things. By identifying strengths of your competitors, you can gauge demand and create unique digital products and services. By identifying gaps in the market, you can find areas to offer a unique value proposition to potential users.

The sky is the limit for how you build on this beyond the project! Some areas to go investigate next are in-person classes, local gyms, local fitness classes, personal instructors, and even online personal instructors.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style='white', palette='Pastel2')
import os

def read_file(filepath, plot = True):
    """
    Read a CSV file from a given filepath, convert it into a pandas DataFrame,
    and return a processed DataFrame with three columns: 'week', 'region', and 'interest'. Generate a line plot using Seaborn to visualize the data. This corresponds to the first graphic (time series) returned by trends.google.com. 
    """
    file = pd.read_csv(filepath, header=1)
    df = file.set_index('Week').stack().reset_index()
    df.columns = ['week','region','interest']
    df['week'] = pd.to_datetime(df['week'])
    plt.figure(figsize=(8,3))
    df = df[df['interest']!="<1"]
    df['interest'] = df['interest'].astype(float)

    if plot:
        sns.lineplot(data = df, x= 'week', y= 'interest',hue='region')
    return df

def read_geo(filepath, multi=False):
    """
    Read a CSV file from a given filepath, convert it into a pandas DataFrame,
    and return a processed DataFrame with two columns: 'country' and 'interest'. Generate a bar plot using Seaborn to visualize the data. This corresponds to the second graphic returned by trends.google.com. Use multi=False if only one keyword is being analyzed, and multi=True if more than one keyword is being analyzed.
    """
    file = pd.read_csv(filepath, header=1)

    if not multi:
        file.columns = ['country', 'interest']
        plt.figure(figsize=(8,4))
        sns.barplot(data = file.dropna().iloc[:25,:], y = 'country', x='interest')

    if multi:
        plt.figure(figsize=(3,8))
        file = file.set_index('Country').stack().reset_index()
        file.columns = ['country','category','interest']
        file['interest'] = pd.to_numeric(file['interest'].apply(lambda x: x[:-1]))
        sns.barplot(data=file.dropna(), y = 'country', x='interest', hue='category')

    file = file.sort_values(ascending=False,by='interest')
    return file

1: Start by loading workout.csv as a variable workout, which tracks interest in the "workout" keyword. Use one of the functions provided to read and plot data over time.

import pandas as pd
import matplotlib.pyplot as plt

# Read and plot using function
workout = read_file("data/workout.csv")

2) Use workout to assess which month demand for fitness is highest across the world, on average, by calling set_index() on your variable with "week" as the first argument, and then chaining this with a .resample(), with the argument 'M' (for month) as the first argument. Then chain this with a .mean() and save the result to an object workout_by_month. Return a row called month_high containing the month with the highest activity from this workout_by_month.

workout_by_month = workout.set_index('week').resample('M').mean()
month_high = workout_by_month[workout_by_month.interest == workout_by_month.interest.max()]

3) Next, load the file called home_workout_gym_workout_home_gym.csv using the same function as before, saving it to any variable name. This file tracks global interest in three keywords. After visually assessing the plot, create a variable called current which equals the keyword that generated the most interest from 2022 to 2023. Create a second variable, peak_covid, indicating which keyword generated the most interest during 2020.

home_wkt = read_file("data/home_workout_gym_workout_home_gym.csv")

current = 'gym workout'
peak_covid = 'home workout'

4) You'll now disaggregate global demand in the "workout" keyword by region to find the top 25 countries with the highest interest in workouts. Read workout_global.csv using the appropriate function provided for loading geographic data, and save it to workout_global. Return a DataFrame containing the 25 countries with the highest interest and save this as top_25_countries.

workout_global = read_geo("data/workout_global.csv")

top_25_countries = workout_global.sort_values(by='interest',ascending=False).head(25).reset_index(drop=True)
top_25_countries

5) Load geo_home_workout_gym_workout_home_gym.csv the appropriate function to load geographic data provided, and save it as geo_categories. This time, however, you are using multiple keywords, so you'll need to change one argument! You found previously that many countries from the Middle East and South Asia are in the top 25 countries with interest in workouts, including the "Philippines", "Singapore", the "United Arab Emirates", "Qatar", "Kuwait", "Malaysia", "Sri Lanka", "India", and "Pakistan". Filter geo_categories to return only these countries and save this as a DataFrame MESA.

geo_countries = read_geo("data/geo_home_workout_gym_workout_home_gym.csv", multi=True)

countries_to_keep = ['Philippines','Singapore','United Arab Emirates','Qatar','Kuwait','Malaysia','Sri Lanka','India','Pakistan']
MESA = geo_countries[geo_countries.country.isin(countries_to_keep)]
MESA

‌
‌
‌