Skip to content
Project: Analyzing Online Sports Revenue
  • AI Chat
  • Code
  • Report
  • Sports clothing and athleisure attire is a huge industry, worth approximately $193 billion in 2021 with a strong growth forecast over the next decade!

    In this notebook, you will undertake the role of a product analyst for an online sports clothing company. The company is specifically interested in how it can improve revenue. You will dive into product data such as pricing, reviews, descriptions, and ratings, as well as revenue and website traffic, to produce recommendations for its marketing and sales teams.

    You've been provided with four datasets to investigate:


    product_idUnique product identifier
    brandBrand of the product


    product_idUnique product identifier
    listing_priceOriginal price of the product
    sale_priceDiscounted price of the product
    discountDiscount off the listing price, as a decimal
    revenueRevenue generated by the product


    product_nameName of the product
    product_idUnique product identifier
    descriptionDescription of the product


    product_idUnique product identifier
    ratingAverage product rating
    reviewsNumber of reviews for the product
    #Importing libraries
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    #importing the datasets
    brands = pd.read_csv("brands.csv") 
    finance = pd.read_csv("finance.csv")
    info = pd.read_csv("info.csv")
    reviews = pd.read_csv("reviews.csv")
    #merging the DataFrames and dropping NAN
    #dropping the null values


    Grouping the data using the listing_price helps to analyze the sales performance of Adidas and Nike products, categorized by their price levels.

    Importance of Grouping

    1.Understanding Performance by Category: Grouping allows us to examine how well each brand is doing within specific price ranges. Some of the insights to look into includes: Is Adidas more successful at selling high-priced products than Nike? Does Nike dominate the low-priced market segment?

    2.Identifying Trends: Grouping helps spot trends or patterns in sales. Are there certain price labels where one brand consistently outperforms the other?

    3.Targeted Strategies: The insights gained can inform more targeted marketing and pricing strategies. For example, if Nike underperforms in the "Expensive" segment, they might consider adjusting their product offerings or marketing campaigns.

    # Correcting the labels to represent quartiles appropriately
    merged_df["price_label"] = pd.qcut(merged_df["listing_price"], q=4, labels=["Low", "Medium-Low", "Medium-High", "High"])
    #grouping by brand_name and price level categories to get mean and volumes
    adidas_vs_nike=merged_df.groupby(["brand","price_label"], as_index=False).agg(
        num_products= ("price_label","count"),

    Analysing Average Revenue by Price Label and Brand

    Key Insights:

    1.Adidas consistently outperforms Nike in revenue across all price labels. The difference is most significant in the High and Medium-High categories.

    2.Both brands see their highest average revenue in the High price label category, suggesting that higher-priced products contribute significantly to their overall earnings.

    3.Nike's performance is more balanced across price labels, with a less pronounced difference between its highest and lowest revenue categories compared to Adidas.


    Adidas shows a strong correlation between higher price labels and increased average revenue. This suggests that Adidas's high-end products are particularly successful in generating revenue.

    Nike shows relatively low and steady revenue across all price segments, indicating either a lesser focus on high-end products or less success in those segments compared to Adidas.

    Strategic Focus: To maximize revenue, it would be advantageous to emphasize the success factors behind Adidas’s high-revenue generation in higher price segments and potentially apply similar strategies to enhance Nike’s performance in those areas.

    # Analysing Average Revenue by Price Label and Brand
    plt.figure(figsize=(12, 8))
    # Plotting the data
    colors = ["limegreen", "orange"]
    sns.barplot(data=adidas_vs_nike, x='price_label', y='mean_revenue', hue='brand')
    # Adding labels and title
    plt.xlabel('Price Label')
    plt.ylabel('Mean Revenue')
    plt.title('Average Revenue by Price Label and Brand')
    # Display the plot

    Analysing the Number of Products by Price Label and Brand

    Adidas has a consistently higher number of products across all price segments compared to Nike, particularly in the Medium-High segment where Nike's presence is minimal. Nike has fewer products overall, with a notably smaller presence in the Medium-High and Medium-Low segments.

    Strategic Insights:

    Adidas’s diverse product range across all price segments may contribute to its higher average revenue, as observed in the previous chart.

    Nike might benefit from expanding its product range, especially in the Medium-High segment, to potentially increase its market share and average revenue.

    # Plot 2: Number of Products by Price Label and Brand
    plt.figure(figsize=(12, 8))
    #colors = ["PuBuGn", "yellow"]
    sns.barplot(data=adidas_vs_nike, x='price_label', y='num_products', hue='brand',palette="bright")
    # Adding labels and title
    plt.xlabel('Price Label')
    plt.ylabel('Number of Products')
    plt.title('Number of Products by Price Label and Brand')
    # Display the plot

    Revenue Generation Per Brand

    Adidas generates way too high revenue than Nike. Out of the total revenue of 12,303,207.7 Adidas generates 93.68% against Nike with ony 6.32%.

    # Grouping by brand to get the total revenue for each brand
    brand_revenue = merged_df.groupby("brand")["revenue"].sum().reset_index()
    # Display the table with a smaller size
    from IPython.display import display, HTML
    display(HTML(brand_revenue.to_html(index=False, max_rows=10, max_cols=2, justify='center')))