Skip to content
Project: Analyzing Online Sports Revenue
Sports clothing and athleisure attire is a huge industry, worth approximately $193 billion in 2021 with a strong growth forecast over the next decade!
In this notebook, you will undertake the role of a product analyst for an online sports clothing company. The company is specifically interested in how it can improve revenue. You will dive into product data such as pricing, reviews, descriptions, and ratings, as well as revenue and website traffic, to produce recommendations for its marketing and sales teams.
You've been provided with four datasets to investigate:
brands.csv
Columns | Description |
---|---|
product_id | Unique product identifier |
brand | Brand of the product |
finance.csv
Columns | Description |
---|---|
product_id | Unique product identifier |
listing_price | Original price of the product |
sale_price | Discounted price of the product |
discount | Discount off the listing price, as a decimal |
revenue | Revenue generated by the product |
info.csv
Columns | Description |
---|---|
product_name | Name of the product |
product_id | Unique product identifier |
description | Description of the product |
reviews.csv
Columns | Description |
---|---|
product_id | Unique product identifier |
rating | Average product rating |
reviews | Number of reviews for the product |
import pandas as pd
import numpy as np
brands = pd.read_csv("brands.csv")
finance = pd.read_csv("finance.csv")
info = pd.read_csv("info.csv")
reviews = pd.read_csv("reviews.csv")
# Start coding here...
merged_df = brands.merge(finance, on='product_id')
merged_df = merged_df.merge(info, on='product_id')
merged_df = merged_df.merge(reviews, on='product_id')
merged_df = merged_df.dropna()
# Question 1 - What is the volume of products and average revenue for Adidas and Nike products based on listing price quartiles
# assign the quartiles to variables to be used inside your if-function
first_quartile = merged_df['listing_price'].quantile(0.25)
second_quartile = merged_df['listing_price'].quantile(0.50)
third_quartile = merged_df['listing_price'].quantile(0.75)
fourth_quartile = merged_df['listing_price'].quantile(1.00)
def brand_label(row):
if row['listing_price'] <= first_quartile:
return 'Budget'
elif row['listing_price'] <= second_quartile:
return 'Average'
elif row['listing_price'] <= third_quartile:
return 'Expensive'
elif row['listing_price'] <= fourth_quartile:
return 'Elite'
merged_df['price_label'] = merged_df.apply(brand_label, axis=1)
adidas_vs_nike = merged_df.groupby(['brand', 'price_label'], as_index=False).agg({'product_id': 'count', 'revenue': 'mean'})
adidas_vs_nike = adidas_vs_nike.rename(columns={'product_id': 'num_products', 'revenue': 'mean_revenue'})
adidas_vs_nike['mean_revenue'] = adidas_vs_nike['mean_revenue'].round(2)
print('Question 1: \n\nThe table below shows that Elite Adidas products are more highly favoured in the market; while consumers buy more of the budget Nike products.')
print(adidas_vs_nike)
# Question 2 - Do any differences exist between the word count of a product's description and its mean rating
merged_df['description_length'] = merged_df['description'].apply(len)
# Upper description length limits
bins = [0, 100, 200, 300, 400, 500, 600, 700]
# Description length labels
labels = ["100", "200", "300", "400", "500", "600", "700"]
# Cut into bins
merged_df["description_length"] = pd.cut(merged_df["description_length"], bins=bins, labels=labels)
# Group by the bins
description_lengths = merged_df.groupby("description_length", as_index=False).agg(
mean_rating=("rating", "mean"),
num_reviews=("reviews", "count")
).round(2)
print('\nQuestion 2: \n\nThe description_length df shows that there are differences between the word count of a product description and its mean rating. \n\nGenerally, the more descriptive a product, the higer the mean rating. \n\nHowever, the change in rating is not proportionate to the descriptiveness or otherwise of the product.')
print(description_lengths)