Skip to content
Project: Data Driven Product Management: Conducting a Market Analysis
  • AI Chat
  • Code
  • Report
  • Spinner

    The project is defined for a product manager for a fitness studio based in Singapore and they are interested in understanding the types of digital products you should offer. They plan to conduct a market analysis in Python to understand how to place your digital fitness products in the regional market. A market analysis will allow them to identify strengths of our competitors, gauge demand, and create unique new digital products and services for potential users.

    We are provided with a number of CSV files in the Files-"data" folder, which offer international data on Google Trends and YouTube keyword searches related to fitness and related products. Two helper functions have also been provided, read_file and read_geo, to help us process and visualize these CSV files for further analysis.

    We'll use pandas methods to explore this data and drive your product management insights.

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    sns.set(style='white', palette='Pastel2')
    import os
    def read_file(filepath, plot = True):
        Read a CSV file from a given filepath, convert it into a pandas DataFrame,
        and return a processed DataFrame with three columns: 'week', 'region', and 'interest'. Generate a line plot using Seaborn to visualize the data. This corresponds to the first graphic (time series) returned by 
        file = pd.read_csv(filepath, header=1)
        df = file.set_index('Week').stack().reset_index()
        df.columns = ['week','region','interest']
        df['week'] = pd.to_datetime(df['week'])
        df = df[df['interest']!="<1"]
        df['interest'] = df['interest'].astype(float)
        if plot:
            sns.lineplot(data = df, x= 'week', y= 'interest',hue='region')
        return df
    def read_geo(filepath, multi=False):
        Read a CSV file from a given filepath, convert it into a pandas DataFrame,
        and return a processed DataFrame with two columns: 'country' and 'interest'. Generate a bar plot using Seaborn to visualize the data. This corresponds to the second graphic returned by Use multi=False if only one keyword is being analyzed, and multi=True if more than one keyword is being analyzed.
        file = pd.read_csv(filepath, header=1)
        if not multi:
            file.columns = ['country', 'interest']
            sns.barplot(data = file.dropna().iloc[:25,:], y = 'country', x='interest')
        if multi:
            file = file.set_index('Country').stack().reset_index()
            file.columns = ['country','category','interest']
            file['interest'] = pd.to_numeric(file['interest'].apply(lambda x: x[:-1]))
            sns.barplot(data=file.dropna(), y = 'country', x='interest', hue='category')
        file = file.sort_values(ascending=False,by='interest')
        return file
    # 1. Load data on global interest in fitness
    workout = read_file('data/workout.csv')
    # 2. Assess global interest in fitness
    workout_by_month = workout.set_index('week').resample('MS').mean()
    month_high = workout_by_month[workout_by_month['interest']==workout_by_month['interest'].max()]
    month_str = str(month_high.index[0].date())
    # 3. Compare interest in home workouts, gym workouts and home gyms
    workout = read_file('data/three_keywords.csv') # This will create a lineplot
    current = 'gym workout'
    peak_covid = 'home workout'
    # 4. Segment global interest by region
    workout_global = read_geo('data/workout_global.csv')
    top_25_countries = workout_global.head(25)
    top_country = top_25_countries['country'].iloc[0]
    # 5. Assessing regional demand for home workouts, gym workouts and home gyms
    geo_categories = read_geo('data/geo_three_keywords.csv', multi=True)
    MESA_countries = ["Philippines", "Singapore", "United Arab Emirates", "Qatar", "Kuwait", "Malaysia", "Sri Lanka", "India", "Pakistan", "Lebanon"]
    MESA = geo_categories.loc[, :]
    # 6. Assess the split of interest by country and category
    top_home_workout_country = 'Philippines'
    # 7. A deeper dive into two countries
    pilot_content = ['yoga', 'zumba']