Data Analysis Portfolio Project #1---Retail Analysis
Tables Of Contents
- Project Background
- Executive Summary
- Data Description and Cleaning
- Initial Exploratory Analysis
- Insights
- Conclusion
Project Background
I completed a comprehensive case study as a Data Analyst Fellow, belonging to the Fall 2023 Data Science For All (DS4A) Career Advancement program cohort. This intensive two-month, part-time fellowship provided a dynamic learning environment, emphasizing real-world business cases and live lectures to hone data analysis skills.
Notably, the DS4A program boasts a rigorous selection process, with an acceptance rate of less than 5%, underscoring its commitment to fostering excellence in data science. In recognition of its impact, the program earned the impressive distinction of being ranked #6 on LinkedIn's list of Top Startups in 2022.
LinkedIn List Link: https://www.linkedin.com/pulse/linkedin-top-startups-2022-50-us-companies-rise-linkedin-news/
Executive Summary
The dynamic realm of retail is a trillion-dollar industry that shapes economies and lifestyles. As shoppers browse aisles and shelves and scroll web pages, their actions and behavior generate data input based on what they are purchasing and showing interest in. Every transaction spins a narrative, and every trend offers a glimpse into the future. The goal is to understand how data analysts can understand these shopping behavior patterns to enhance product promotion and marketing activities, ultimately encouraging repeat business from these customers.
Company XYZ, an online retailer, needs your help to understand its shoppers and to understand the health of its business.
To help answer this question, this case study will look to achieve the following:
- Create a dynamic dashboard to visualize customer retention rates, shopping trends, and significant findings across customer segments and time periods.
- Prepare data and calculate segment-specific metrics on order and customer data for analysis, and assess the role of segments in retention KPI significance.
- Categorize customers into defined segments and timeframes, computing retention rates as a guiding metric for customer loyalty.
A note of caution: This analysis is entirely backward-looking, relying on past data. That's valuable information to help answer the questions above.
Data Description and Cleaning
Excel Spreadsheets
- BigSupplyCo_Categories: Retail Categories and Retail Category IDs
- BigSupplyCo_Customers: Customer Names, IDs, and Addresses
- BigSupplyCo_Departments: Retail Department Names and ID's
- BigSupplyCo_Orders: All customer orders
- BigSupplyCo_Products: Product Names and IDs
Remarks on Data: I conducted a thorough data cleansing and transformation process within Power BI, implementing substantial sanitization measures. The cleansing involved a meticulous review and replacement of erroneous data points, ensuring the integrity and reliability of the dataset. The use of Power BI facilitated a streamlined and effective approach, enhancing the overall quality of the data and setting a solid foundation for subsequent analysis of data from the years 2015-2018.
Initial Exploratory Analysis
I used Power BI to create measures and employed DAX for better analysis, calculating metrics like retention, profit, and expense rates.
I utilized Power BI to generate an overview table, presenting key metrics like retention rate, profit rate, expense rate, total revenue, and total expense across various time periods. This helped to fuel my analysis.
Insights
Finding 1
I used a bar chart and an area to compare the sales, expenses, and profits across a time series. Sales have been consistent over the last several years and averaged close to 3MM from 2015-2017. However, expenses have also remained high during the same period and averaged 2.4MM from 2015-2017, which has limited profitability to an average of 0.6MM.
I had limited data about expenses, but my primary recommendation based on the visualization would be for the company to assess its costs and take proactive steps to reduce them.
Finding 2
I employed a scatter chart visualization in Power BI, featuring an average retention rate trend line, which helped to identify top-selling retail categories with above-average retention rates. The scatterplot was complemented with tables showcasing relevant metrics, and a slicer facilitated the display of top categories and retention rates across time periods.
Among the noteworthy findings, Cardio Equipment emerged as the highest-selling category, boasting 37,000 orders, 8MM in Sales, and over 800K in profits. Additionally, categories such as Men's Footwear and Men's Clothing displayed above-average retention rates, contributing significant revenue and decent profits. Given the already higher retention, this analysis highlights an opportunity to reduce expenses and strategize for increased revenue and earnings in these promising categories while focusing on expense reduction.
Finding 3
In the two visualizations below, I used tables and slicers to create analyses that delve deeper into the product and customer data. The Cardio Equipment, Men's Footwear, and Men's Clothing Categories account for 44% of all orders, nearly 29,000 of the 65K orders.
The table highlights the top products, which provides company XYZ with the opportunity to focus on promoting these through a low-cost advertising campaign since the products are high-selling.
I also created a visualization to show the customers with the most orders. This gives Company XYZ an opportunity to make targeted low-cost advertising campaigns to get these customers to purchase items again.