Sports clothing and athleisure attire is a huge industry, worth approximately $193 billion in 2021 with a strong growth forecast over the next decade!
In this notebook, you will undertake the role of a product analyst for an online sports clothing company. The company is specifically interested in how it can improve revenue. You will dive into product data such as pricing, reviews, descriptions, and ratings, as well as revenue and website traffic, to produce recommendations for its marketing and sales teams.
You've been provided with four datasets to investigate:
brands.csv
Columns | Description |
---|---|
product_id | Unique product identifier |
brand | Brand of the product |
finance.csv
Columns | Description |
---|---|
product_id | Unique product identifier |
listing_price | Original price of the product |
sale_price | Discounted price of the product |
discount | Discount off the listing price, as a decimal |
revenue | Revenue generated by the product |
info.csv
Columns | Description |
---|---|
product_name | Name of the product |
product_id | Unique product identifier |
description | Description of the product |
reviews.csv
Columns | Description |
---|---|
product_id | Unique product identifier |
rating | Average product rating |
reviews | Number of reviews for the product |
#Importing libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#importing the datasets
brands = pd.read_csv("brands.csv")
finance = pd.read_csv("finance.csv")
info = pd.read_csv("info.csv")
reviews = pd.read_csv("reviews.csv")
#merging the DataFrames and dropping NAN
merged_df=finance.merge(info,on="product_id")
merged_df=merged_df.merge(info,on="product_id")
merged_df=merged_df.merge(brands,on="product_id")
#dropping the null values
merged_df.dropna(inplace=True)
Grouping
Grouping the data using the listing_price helps to analyze the sales performance of Adidas and Nike products, categorized by their price levels.
Importance of Grouping
1.Understanding Performance by Category: Grouping allows us to examine how well each brand is doing within specific price ranges. Some of the insights to look into includes: Is Adidas more successful at selling high-priced products than Nike? Does Nike dominate the low-priced market segment?
2.Identifying Trends: Grouping helps spot trends or patterns in sales. Are there certain price labels where one brand consistently outperforms the other?
3.Targeted Strategies: The insights gained can inform more targeted marketing and pricing strategies. For example, if Nike underperforms in the "Expensive" segment, they might consider adjusting their product offerings or marketing campaigns.
# Correcting the labels to represent quartiles appropriately
merged_df["price_label"] = pd.qcut(merged_df["listing_price"], q=4, labels=["Low", "Medium-Low", "Medium-High", "High"])
#grouping by brand_name and price level categories to get mean and volumes
adidas_vs_nike=merged_df.groupby(["brand","price_label"], as_index=False).agg(
num_products= ("price_label","count"),
mean_revenue=("revenue","mean")).round(2)
print(adidas_vs_nike)
Analysing Average Revenue by Price Label and Brand
Key Insights:
1.Adidas consistently outperforms Nike in revenue across all price labels. The difference is most significant in the High and Medium-High categories.
2.Both brands see their highest average revenue in the High price label category, suggesting that higher-priced products contribute significantly to their overall earnings.
3.Nike's performance is more balanced across price labels, with a less pronounced difference between its highest and lowest revenue categories compared to Adidas.
Conclusion:
Adidas shows a strong correlation between higher price labels and increased average revenue. This suggests that Adidas's high-end products are particularly successful in generating revenue.
Nike shows relatively low and steady revenue across all price segments, indicating either a lesser focus on high-end products or less success in those segments compared to Adidas.
Strategic Focus: To maximize revenue, it would be advantageous to emphasize the success factors behind Adidas’s high-revenue generation in higher price segments and potentially apply similar strategies to enhance Nike’s performance in those areas.
# Analysing Average Revenue by Price Label and Brand
plt.figure(figsize=(12, 8))
# Plotting the data
colors = ["limegreen", "orange"]
sns.set_palette(sns.color_palette(colors))
sns.barplot(data=adidas_vs_nike, x='price_label', y='mean_revenue', hue='brand')
# Adding labels and title
plt.xlabel('Price Label')
plt.ylabel('Mean Revenue')
plt.title('Average Revenue by Price Label and Brand')
plt.legend(title='Brand')
# Display the plot
plt.show()
Analysing the Number of Products by Price Label and Brand
Adidas has a consistently higher number of products across all price segments compared to Nike, particularly in the Medium-High segment where Nike's presence is minimal. Nike has fewer products overall, with a notably smaller presence in the Medium-High and Medium-Low segments.
Strategic Insights:
Adidas’s diverse product range across all price segments may contribute to its higher average revenue, as observed in the previous chart.
Nike might benefit from expanding its product range, especially in the Medium-High segment, to potentially increase its market share and average revenue.
# Plot 2: Number of Products by Price Label and Brand
plt.figure(figsize=(12, 8))
#colors = ["PuBuGn", "yellow"]
#sns.set_palette(sns.color_palette(colors))
sns.barplot(data=adidas_vs_nike, x='price_label', y='num_products', hue='brand',palette="bright")
# Adding labels and title
plt.xlabel('Price Label')
plt.ylabel('Number of Products')
plt.title('Number of Products by Price Label and Brand')
plt.legend(title='Brand')
# Display the plot
plt.show()
merged_df.info()
Revenue Generation Per Brand
Adidas generates way too high revenue than Nike. Out of the total revenue of 12,303,207.7 Adidas generates 93.68% against Nike with ony 6.32%.
# Grouping by brand to get the total revenue for each brand
brand_revenue = merged_df.groupby("brand")["revenue"].sum().reset_index()
# Display the table with a smaller size
from IPython.display import display, HTML
display(HTML(brand_revenue.to_html(index=False, max_rows=10, max_cols=2, justify='center')))