Skip to content
Optimizing Online Sports Retail Revenue
  • AI Chat
  • Code
  • Report
  • 1. Counting missing values

    Sports clothing and athleisure attire is a huge industry, worth approximately $193 billion in 2021 with a strong growth forecast over the next decade!

    In this notebook, we play the role of a product analyst for an online sports clothing company. The company is specifically interested in how it can improve revenue. We will dive into product data such as pricing, reviews, descriptions, and ratings, as well as revenue and website traffic, to produce recommendations for its marketing and sales teams.

    The database provided to us, sports, contains five tables, with product_id being the primary key for all of them:

    info

    columndata typedescription
    product_namevarcharName of the product
    product_idvarcharUnique ID for product
    descriptionvarcharDescription of the product

    finance

    columndata typedescription
    product_idvarcharUnique ID for product
    listing_pricefloatListing price for product
    sale_pricefloatPrice of the product when on sale
    discountfloatDiscount, as a decimal, applied to the sale price
    revenuefloatAmount of revenue generated by each product, in US dollars

    reviews

    columndata typedescription
    product_namevarcharName of the product
    product_idvarcharUnique ID for product
    ratingfloatProduct rating, scored from 1.0 to 5.0
    reviewsfloatNumber of reviews for the product

    traffic

    columndata typedescription
    product_idvarcharUnique ID for product
    last_visitedtimestampDate and time the product was last viewed on the website

    brands

    columndata typedescription
    product_idvarcharUnique ID for product
    brandvarcharBrand of the product

    We will be dealing with missing data as well as numeric, string, and timestamp data types to draw insights about the products in the online store. Let's start by finding out how complete the data is.

    %%sql
    postgresql:///sports
    
    SELECT 
    COUNT(*) AS total_rows,
    COUNT(description) AS count_description,
    COUNT(listing_price) AS count_listing_price,
    COUNT(last_visited) AS count_last_visited
    FROM info
    INNER JOIN finance
    ON info.product_id=finance.product_id
    INNER JOIN traffic
    ON info.product_id=traffic.product_id;

    2. Nike vs Adidas pricing

    We can see the database contains 3,179 products in total. Of the columns we previewed, only one — last_visited — is missing more than five percent of its values. Now let's turn our attention to pricing.

    How do the price points of Nike and Adidas products differ? Answering this question can help us build a picture of the company's stock range and customer market. We will run a query to produce a distribution of the listing_price and the count for each price, grouped by brand.

    %%sql
    
    SELECT
    brand,
    listing_price::integer,
    COUNT(*)
    FROM finance
    INNER JOIN brands
    ON finance.product_id=brands.product_id
    WHERE finance.listing_price > 0
    GROUP BY brand, listing_price
    ORDER BY listing_price DESC;

    3. Labeling price ranges

    It turns out there are 77 unique prices for the products in our database, which makes the output of our last query quite difficult to analyze.

    Let's build on our previous query by assigning labels to different price ranges, grouping by brand and label. We will also include the total revenue for each price range and brand.

    %%sql
    
    SELECT
    brand,
    COUNT(*),
    SUM(revenue) AS total_revenue,
    CASE
    WHEN listing_price < 42 THEN 'Budget'
    WHEN listing_price >= 42 AND listing_price < 74 THEN 'Average'
    WHEN listing_price >= 74 AND listing_price < 129 THEN 'Expensive'
    WHEN listing_price > 129 THEN 'Elite'
    END price_category
    FROM finance
    INNER JOIN brands
    ON finance.product_id=brands.product_id
    WHERE brand IS NOT NULL
    GROUP BY brand, price_category
    ORDER BY total_revenue DESC;

    4. Average discount by brand

    Interestingly, grouping products by brand and price range allows us to see that Adidas items generate more total revenue regardless of price category! Specifically, "Elite" Adidas products priced \$129 or more typically generate the highest revenue, so the company can potentially increase revenue by shifting their stock to have a larger proportion of these products!

    Note we have been looking at listing_price so far. The listing_price may not be the price that the product is ultimately sold for. To understand revenue better, let's take a look at the discount, which is the percent reduction in the listing_price when the product is actually sold. We would like to know whether there is a difference in the amount of discount offered between brands, as this could be influencing revenue.

    %%sql
    
    SELECT
    brand,
    AVG(discount)*100::decimal AS average_discount
    FROM brands
    FULL JOIN finance
    ON brands.product_id=finance.product_id
    WHERE brands.brand IS NOT NULL
    GROUP BY brands.brand;

    5. Correlation between revenue and reviews

    Strangely, no discount is offered on Nike products! In comparison, not only do Adidas products generate the most revenue, but these products are also heavily discounted!

    To improve revenue further, the company could try to reduce the amount of discount offered on Adidas products, and monitor sales volume to see if it remains stable. Alternatively, it could try offering a small discount on Nike products. This would reduce average revenue for these products, but may increase revenue overall if there is an increase in the volume of Nike products sold.

    Now explore whether relationships exist between the columns in our database. We will check the strength and direction of a correlation between revenue and reviews.

    %%sql
    
    SELECT
    corr(reviews, revenue) AS review_revenue_corr
    FROM finance
    INNER JOIN reviews
    USING(product_id)

    6. Ratings and reviews by product description length

    Interestingly, there is a strong positive correlation between revenue and reviews. This means, potentially, if we can get more reviews on the company's website, it may increase sales of those items with a larger number of reviews.

    Perhaps the length of a product's description might influence a product's rating and reviews — if so, the company can produce content guidelines for listing products on their website and test if this influences revenue. Let's check this out!

    %%sql
    
    SELECT TRUNC(LENGTH(i.description), -2) AS description_length,
        ROUND(AVG(r.rating::numeric), 2) AS average_rating
    FROM info AS i
    INNER JOIN reviews AS r 
        ON i.product_id = r.product_id
    WHERE i.description IS NOT NULL
    GROUP BY description_length
    ORDER BY description_length;

    7. Reviews by month and brand

    Unfortunately, there doesn't appear to be a clear pattern between the length of a product's description and its rating.

    As we know a correlation exists between reviews and revenue, one approach the company could take is to run experiments with different sales processes encouraging more reviews from customers about their purchases, such as by offering a small discount on future purchases.

    Let's take a look at the volume of reviews by month to see if there are any trends or gaps we can look to exploit.

    %%sql
    
    SELECT
    b.brand,
    EXTRACT(MONTH FROM last_visited) AS month,
    count(*) AS num_reviews
    FROM reviews AS r
    INNER JOIN brands AS b
    USING(product_id)
    INNER JOIN traffic AS t
    USING(product_id)
    WHERE brand IS NOT NULL AND EXTRACT(MONTH FROM last_visited) IS NOT NULL
    GROUP BY brand, month
    ORDER BY brand, month

    8. Footwear product performance

    Looks like product reviews are highest in the first quarter of the calendar year, so there is scope to run experiments aiming to increase the volume of reviews in the other nine months!

    So far, we have been primarily analyzing Adidas vs Nike products. Now, let's switch our attention to the type of products being sold. As there are no labels for product type, we will create a Common Table Expression (CTE) that filters description for keywords, then use the results to find out how much of the company's stock consists of footwear products and the median revenue generated by these items.