Skip to content

Pet Supply Sales

Task 1

For every column in the data:

a. State whether the values match the description given in the table above.

b. State the number of missing values in the column.

c. Describe what you did to make values match the description if they did not match.

product_id: There are 1500 unique values that match the description given. There are no missing values. No changes were made to this column.

category: There were six unique values that matched the six given in the data dictionary. There were 25 missing values. The missing values were replaced with “Unknown” as per the data description.

animal: There were four unique values that matched the four given in the data dictionary. There were no missing values so no changes were made to this column.

size: This column has three categories that match those in the description. There were no missing values and no changes were made to this column.

price: The values of this column ranged from 12.85 to 54.16, which is consistent with the description given. 150 values were missing. The missing values were replaced with the median value of the remaining data, which was 28.07.

sales: The values of this column were all positive and rounded to 2 decimal places, which is consistent with the description given. There were no missing values. No changes were made to this column.

rating: The values of this column ranged from 1 to 10, which is consistent with the description given. There were 150 missing values. The missing values were replaced with 0 as per the data description.

repeat_purchase: There were two unique values (0 and 1) that matched the description given. There were no missing values so no changes were made to this column.

Task 2

Create a visualization that shows how many products are repeated purchases. Use the visualization to:

a. State which category of the variable repeat purchases has the most observations

b. Explain whether the observations are balanced across categories of the variable repeat purchases

There are seven possible categories included in this data. The most common category listed is equipment. Food, housing, medicine, and toys are all behind with over 70 less repeat purchases. The categories are unbalanced, with the most repeat purchases being on equipment. The team should focus on increasing repeat purchases on food as that is their main goal.

Task 3

Describe the distribution of all of the sales. Your answer must include a visualization that shows the distribution.

As the company thinks that the number of sales will be important, we should look at how the sales are distributed.

Looking at all the sales, we can see from the graphic below that most sales occurred between 250 to 2000 with the average being around 1000. The distribution of sales is normally distributed. There are some outliers that are more than 2000, but this is very uncommon.

When looking to increase the number of sales, the company should look at sales under 1,000. This way they will focus on those customers to help increase profits.

Task 4

Describe the relationship between repeat purchases and sales. Your answer must include a visualization to demonstrate the relationship.

Finally we want to combine the two pieces of information to see how repeat purchases impact sales. So far equipment purchases over $1000 would be ideal but we need to look at the two variables together to see if this is realistic.

By using a box plot we can look at each category based on repeat purchases and sales. We originally wanted to look at equipment, but the interquartile range of sales is lower than all the other categories except unknown. This would suggest that the majority of sales may be lower than other types. However, this could also be an effect of having the largest number of repeat purchases.

Based on all of the above, we would recommend that the company can focus on food with under $1000 of repeat purchases to start, but also keep an open mind to including accessories, housing, and medicine as they have the lowest sales. Further analysis should be done to understand if repeat purchases really do impact sales. The company should also consider looking at repeat purchases with lower sales so that we can further analyze whether the category has any impact over the number of repeat purchases.