Source: https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset/data
Revealing Customer Shopping Trends Dataset
Introduction
Customers feedback is one of the most important metrics indicating how the business is doing. It can either draw more clients or cause customers to churn.
In this project, we'll look deep into different aspects of customer buying preferences to investigate what affects the review rating.
In this data analysis project, I will answer the business questions, such as:
- What is the average review rating and how close is it to the target value?
- How do purchase history and subscription status affect the rating given?
- Does demographics have to do with the rating?
- Which clothes categories, colors, and specific items get the highest/lowest rating?
- In which states do customers leave the best/worst reviews?
In this project, I used Power Query for data validation and cleaning, and PowerBI for analysis and creating a dashboard. The Datacamp Workspace is used for building this report with insights and recommendations.
This data analysis and visualizations are based on publicly available Kaggle dataset. The data in a CSV format can be downloaded here
Data Validation and Cleaning
Checking data for any errors and inconsistencies is boring but crucial part of creating any data report. In PowerBI Power Query, I loaded a CSV file with data, which contained 3900 rows.
Each observation represents one unique customer. The data is stored in 18 columns:
- Customer ID: A unique identifier of each customer stored as a whole number.
- Age: The whole number in years.
- Gender: One of two distinct values: "Male" or "Female". Stored as text.
- Item Purchased: The last item the customer purchased (e.g. Dress, Skirt, Hoodie etc). Data format is text.
- Category: The category of an item, e.g. Clothing, Footwear. Text format.
- Purchase Amount (USD): Specifies the amount paid in USD. Stored as a whole number.
- Location: The US state a customer is located. Shown as a State or Province format.
- Size: Indicates the size of an item purchased, e.g. S, M, XL. Stored as text.
- Color: The information about the item's color, in a text format.
- Season: One of four distinct values: Fall, Winter, Spring and Summer. The data format is text.
- Review Rating: Shown as a decimal number, the rating a particular customer assigned to the most recent purchase, between 1.0 and 5.0.
- Subscription Status: Indicates whether a customer has a subscription. Two text values possible - "Yes" or "No".
- Shipping Type: Describes the way an order was delivered. One of five distinct text values.
- Discount Applied: Shows if the discount was applied. Two possible text values - "Yes" or "No".
- Promo Code Used: Indicates whether the promo code was used. Two text values possible - "Yes" or "No".
- Previous Purchases: The whole number showing how many previous purchases the customer has made.
- Payment Method: Describes the way a transfer was proceed. One of 6 distinct text values.
- Frequency of Purchases: Shows how often this customer buys something from the company. One of 7 distinct text values, e.g. Weekly.
Analysis and Visualization
Overview
Given the 3900 customers, the average review rating is 3.78, in the scale between 1.0 and 5.0. This is 11% less than the target average rating of 4.25. Male customers tend to leave higher rating than female - 3.83 vs 3.69, respectively. The subscribers leave slightly better reviews (3.91) than non-subscribers (3.74). There are both genders among customers without subscription, but all subscribers are men. There is a group of customers who are non-subscribers, but used the discount, and gave the rating as low as 3.68, they're all men.
The frequency of purchases is distributed pretty evenly. Speaking of the review rating, the highest rating left those who buy on a monthly basis - 3.95 (men 4.12, women 3.64), compared to customers shopping quarterly - 3.54 (men 3.58, women 3.35).
The top 3 states by rating are: Texas (3.91), Wisconsin (3.89), Iowa (3.85).
The bottom 3 states by rating are: West Virginia (3.58), Oklahoma (3.61), New Hampshire (3.61).
In Texas, the best rating of 4.36 left female customers who don't have a subscription, discount, and by Bi-weekly. In Winsonsin, on the contrary, the lowest rating of 3.0 gave female customers without a subscription and discount buying annualy or quarterly.
Rating and Previous Purchases
As seen on the Average of Previous Purchases by Subscription Status and Review Rating chart, there's no correlation between the number of previous purchases and the most recent rating given. However, customers who have a subscription tend to have longer buying history.
But there's also an interesting detail. All customers with a subscription who left the highest 5.0 rating, have previously bought only the average of 18.67 items, which is the lowest value on the chart.
Speaking of the relationship between Average Purchase Amout and Review Rating, the average bill was slightly higher for non-subscribers (USD 60.05) than subscribers (USD 59.51). Average Purchase Amout for non-subscribers and subscribers diverged the most when the Review Rating was 4.20, when non-subscribers paid $8.66 more on average than customers with a subscription.
Rating and Demographics
As seen on the violin plot, customers age is evenly distributed, between 18 and 70 years old. For both male and female the median age is 44. There are 2652 male customers (68%) and 1248 female customers (32%).
The youngest customers (18-25 years old) share the highest rating - 3.88, whereas customers between 46 and 55 years old assessed their experience as 3.71 - the lowest rating for the age group.
Shirts have the lowest average rating (3.63) despite being the second bestseller (4.43%). The highest average rating have Gloves (3.86), which account for 3.64% of sales, the second lowest selling item.
Rating and Item Characteristics
The highest average rating was recorded for Footwear category items (3.79), followed by Accessories (3.77), Outwear (3.75) and Clothing (3.72).
The average review rating also varies through seasons, reaching as high as 3.82 in spring, and decreasing to 3.64 in fall.
By analyzing each item's color, here are the most popular colors by reason: fall - yellow, winter - green, spring - olive, summer - silver. The differences between the first and the second popular color are marginal, regardless of season.
Top 5 women's items colors: green, pink, magenta, yellow, teal.
Top 5 men's items colors: green, cyan, violet, teal, olive.
All the gold color items were reviewed average 4.23, the gray color got a rating as low as 3.23.
Female customers who ordered items from Closing category in summer via 2-Day Shipping, left the average rating as low as 3.48.
Female customers who ordered items from Footwear category in fall via Standard shipping, left the average rating as high as 4.30, which is above the target rating. But there were only 6 such customers.
If you want to learn more about customers data insights, feel free to follow this link
Key Findings
- The average review rating equals 3.78, which is 11% below the target value;
- There was no evidence that purchase history or subscription status affected the review rating;
- Male customers rate their purchases 4% higher that female;
- Footwear gets the highest rating, Clothing the lowest. The best-reviewed item is Gloves, the worst-reviewed is Shirts. The gold color items get the highest rating, gray items get the lowest;
- In Texas customers leave the highest reviews, the lowest are in West Virginia.
Recommendations
- Improve Product Quality: Since the average review rating is 11% below the target value, focus on enhancing the quality of products, especially in categories with lower ratings like Clothing and Shirts.
- Targeted Marketing: Develop marketing strategies that highlight the high-rated products such as Footwear and Gloves, and promote items in gold color to attract more positive reviews.
- Gender-Specific Campaigns: Since male customers rate their purchases 4% higher than female customers, consider creating targeted campaigns to understand and address the specific needs and preferences of female customers.
- Regional Focus: Implement region-specific strategies to improve customer satisfaction in areas with lower ratings, such as West Virginia, while maintaining high standards in regions like Texas.
- Customer Feedback Loop: Establish a robust feedback mechanism to gather more detailed insights from customers, especially focusing on areas with lower ratings, to continuously improve products and services.
Thank you for reading till the very end! Your feedback and sharing are greatly appreciated.