Data Analyst Associate Practical Exam Submission
You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.
You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.
Today, i will start to working with DataCamp Associate Certification. The dataset used here is 'Pet Box Subscription' in PetMind company. PetMind is a pet retailer based in the United States, offering a range of products that cater to both luxury and everyday needs of pets. The company's product line includes high-end toys and food essentials that are regularly consumed by pets.
To boost sales and improve profitability, PetMind has been experimenting with a new approach for the last year. The company has been trying to sell more everyday items repeatedly to increase their sales figures. They believe that this strategy could help them achieve sustainable growth by creating a more predictable revenue stream.
As part of their efforts to evaluate the impact of repeat purchases on sales, PetMind has decided to commission a report. The report will be using Microsoft Excel and PowerBI Desktop to analyze how repeat purchases of everyday items can impact sales growth over time. This will help them better understand the potential benefits of this strategy and make informed decisions about their product offerings and marketing efforts.
Task 1 - Data Validation
The original data is 1500 rows and 8 columns. The first thing i did was importing the data and checking the NULL or missing values. After i found them all, the NULL or missing values will be replaced by 'Unknown' for column 'category', 'animal', and 'size'. For 'price' and 'sales' column, the missing values will be replaced by overall median price. Finally, missing values in 'rating' column will be replaced by zero. The rule also said to remove rows in repeat_purchase column if any null values exist, but there is no missing value and still leaving 1500 rows of data. Looking at the remaining columns:
- There were 1500 unique product_id and no missing values, as expected.
- There were 7 category unexpectedly. 25 rows were named '-'.
- There were 4 animal type as expected.
- There were 3 unique size as expected, but not in the right format. Need cleaning.
- Price range values are between 12.85 to 54.16 and rounded to two decimals as expected, but 150 items were unlisted.
- There were contain sales numerical value as expected with no missing values, and already rounded to two decimal places.
- The ratings value range from 1-9 as expected, with 150 values contains 'NA'.
- There were 2 unique repeat_purchase as expected.
Data Cleaning method:
- There were 25 rows of missing values in category column. I replace it with 'Unknown'
- There were a variation in the 'size' column, it have multiple value of size with different casing. I clean the data by replacing all lowercase and uppercase values to capital in each first letter (initcap).
- There were 150 of 'unlisted' values in price column along with NA values in rating. I replaced unlisted value in price column with median of 27.93 and replaced NA with 0.
- The product_id, animal, sales, and repeat purchase remain unaffected.
Task 2 - Create Visualization
a. Which category of the variable repeat purchases has the most observations?
After cleaning the data, I imported data from Excel workbook to PowerBI for making visualizations easier. Thus i found out that the most category that has the most repeat purchases is Equipment, followed by Medicine, Housing and Food.
Visualizations from different perspective: Bar Chart. PetMind company want to increase sales through Food as everyday items, but the fact says the otherwise. We want to suggest the company to focus on the equipment categories to increase sales because it is the most repeated item bought by the customers.
b. Explain whether the observations are balanced across categories of the variable repeat purchases
The observation is clearly not balanced across categories. Animal equipment have the highest total repeat purchase for 221 times among the others category. Medicine, Housing, Food and Toys having almost the same repeat purchases of 150 times average. In the other hand, Accesory is the lowest repeat purchases with 70 times repeat purchases. The rest 14 items is considered unknown categories.
Task 3 - Describing Distribution of All Sales
While we want to focus on how repeated purchase affecting sales, we should analyze how number of sales getting distributed first. Looking at histogram above, we can see the most sales occured between 60k to 150k with average sales of 91,264. There is no much sales above 150k and very small amount of sales above 210k. We can't say much about this distribution, but let's see this sales distribution with another contextual.
Sales Heatmap
Total Sales by Animal Size Bar Chart
After we grouped the sales within category bins, we can see now that Equipment and Toys still leading the sales in PetMind with small animal type. In the sales heatmap, Cats have the highest selling in equipment and toys. This implied that large number of transcation could be affected by only one category, size and one animal, thus does not guarantee a high volume of sales overall. Even after the company tried to increase sales by selling more everyday product repeatedly, food sales were still in the third place. But we still don't have enough evidence to say that PetMind strategy to increase everyday product repeatedly is worked. So we will move on to the next findings.
Task 4 - Relationship Between Repeat Purchases and Sales
How does the repeat purchases impact sales?
Finally we will see from the repeated purchase perspective. Assuming non repeated sales was non-pet box subscription and repeated sales were people who subscribed to the pet box PetMind provide, repeated purchase did increase total sales by 44,64%.
To answers the most first question in the task, that the company wants to increase sales by selling more everyday products repeatedly, we need to compare sales between everyday items and luxury items based on repeated sales that been done.
Based on all of above, we could summarize that even though food sales were below Toys category, repeated purchase did increase food sales by 33,33% of the food sales that not repeated. So the store decisions to provide more everyday item up to Pet Box Subscription were right, because it was impacted on higher sales. Further analysis should be done to understand if repeated purchase did impact sales or not.