Skip to content
PetMind Report
"Petmind" is a company wanting to understand how they can increase sales by selling more of their products that are repeatedly purchsed.
Below is an analysis of the dataset that was provided for this task:
Cleaning and Validation of the Data
I started by downloading and opening the data in MS Excel for validation. I discovered there were 1501 rows with 8 columns. I then went column by column to ensure the data was ready for discovery. Below are my findings:
- product_id column: The given description of nominal was correct and there were no missing values in the column, as expected.
- category column: The given description of nominal was correct. I verified there were 25 missing values, which I replaced with the requested “Unknown” description. Excluding the “Unknown” records, there were 6 values in the category column ("Housing, Food, Toys, Equipment, Medicine, Accessory"), as to be expected.
- animal column: The given description of nominal was correct. I verified there were no missing values and there were 4 types of animals the products are for. "Dog, Cat, Fish, Bird," as to be expected.
- size column: The description of ordinal was correct. I verified there were no missing values. There were indeed 3 sizes of products, "Small, Medium, and Large. The actual case for each value however, was slightly off and not uniform over the 3 sizes. I corrected all the values in the column by using a proper function so they matched the given expected values.
- price column: The given description of continuous was correct. I counted 150 missing continuous values, instead having "unlisted". These values were replaced with the “overall median price”, as requested. The median was calculated to be 28.065 but was rounded to 28.07, as it was requested this value be rounded to two decimal places.
- sales column: The given description of continuous was correct. I verified the values were all positive values and there were no missing values in the column.
- rating column: The given description of discrete was correct. I verified however, there were 150 missing values in the column that did not have a discrete value. These values instead had "NA" as a value. These values were replaced with a “0” rating, as requested.
- repeat_purchase colum: The given description of nominal was correct. I verified there were no missing values and the column appeared as expected.
Repeat Product Purchases
- After validating the data I moved on to the answering the business question PetMind wanted to discover: To increase sales we want to sell more everyday products repeatedly. How did repeat purchases impact sales?
- It is clear from above, of all the products purchsed repeatedly, Equipment was the most observed. Which eclipsed the next closest, Medicine, by almost 100. We can see the distribution is skewed toward Equiptment in both this graph which shows repeat purchases and the next. It is important to note the next several categories are only off by a few purchases. Which could say more about consumer preferences than the type of product PetMind is offering, but more research is needed to determine if this is of any significance or not.
- For a more thorough analysis, a visual of the counts of products only purchased once is included below:
Single Product Purchases
- From this graph while we can see Eqiupment still remains at the top, Toys however rank behind Equiptment as the second most category of products only purchased once.
- What is not clear however from above is the distiction between "luxury" and "everyday items, which could be broken down into sub-categories but that information was not provided for this analysis.
Distribution of All Sales from Dataset
This visual includes products that were not repeat purchases.
- Something notable is the mean and median values for all sales from the past year at PetMind both hovered close to 1000. The actual value for the average was 996.60 (rounded) and the median value was 1000.83.
- Additional analysis shows there are several outliers just beyond around 1800 for sales as well as several extreme outliers for sales beyond 2200. The extreme outliers do pull the mean toward it. We see a large vaiance because of this. If we were to ignore the most extreme outliers, those around 1800 would be close to the same distance away from the mean as the lower data points are. Which are around between 200 - 300.
- It is possible the extreme outlier values could be described as "luxury" products. The below graph might confirm this.
This visual focuses on the distribution of repeat purchases.
- The extreme outliers of sales belong to the Toys category. They appear to be the most expensive collection of Toys PetMind offers, and thus seemingly most profitable. For the next collection of outliers around 1800 we saw in the previous graph, these span across different categories. However, take note of the outliers in the Equiment and Toys categories. These accounted for the most and second most single purchases and their outliers pull their median values toward them.
- We know PetMind at least classifies some Toys as "luxury" products and at least classifies some Food items as "everyday" products. What we do not know from this data is the sub-categories as well as what other of the given categories (Housing, Equiptment, Medicine, and Accessory) could be classified as either "luxury" or "everyday" and to what degree.
- This sales data could show a positive trend from the previous year's sales that in fact there was a trend of more "everyday" products being sold or not. However, for this analysis only the previous year's data was used. Further analysis is needed to show tangiable trends between what each category falls into as a sub-category, "luxury" or "everyday", as well as if "everyday" products truly were sold more by quantity than "luxury."
Relationship Between Repeat Purchases and Sales
- Of the products repeatedly purchased, Equipment, followed by Toys, and then Food were where the top half of PetMind's sales came from last year. All three types of products contributing over 160,000 in sales.
- An issue with this graph is while yes it does show Equipment contributing more to sales than other categories, we do not know what PetMind classifies as "luxury" within each category. Items that were more expensive and repeatedly purchased could be skewing the data.
- This graph shows a breakdown of prices by category. We can see both Equipment and Food have a lower median price than Toys do. Yet if we focus on determining if PetMind sold more "everyday" products than "luxury" products in this past year, we might want to pay attetion to the median values for Equipment and Toys. Both are skewed toward their extreme outliers and appear to have more outliers than other categories do.
Product Sizes compared to their prices breakdown
- This final graph shows an alternative view of sales for repeat purchases, focusing on size as a contributing factor. We can see the average prices per size per category above. It shows even though small Toys on average cost more than most other products regardless of size, consumers are still willing to spend money on them.
- We can see the average cost of a small Food product being more expensive than the medium size. Which accounted for a large amount of individual sales.
Conlusions
- We know PetMind sells a mix of "luxury" and "everyday" products. We also know Equipment contributed to the most repeated sales from the past year. Further analysis on price, as well as determining what classifies a product no matter the category, of "luxury" or "everyday," would help PetMind find out if they truly achieved their goal of selling more "everyday" products repeatedly from the past year. It does however seem like repeated purchases regardless if they were "everyday" or not, did hold more sales weight than products purchased only once.