Skip to content

Data Analyst Associate Practical Exam Submission

Background: PetMind

PetMind is a retailer of products for pets. They are based in the United States. PetMind sells products that are a mix of luxury items and everyday items. Luxury items include toys. Everyday items include food. The company wants to increase sales by selling more everyday products repeatedly. They have been testing this approach for the last year. They now want a report on how repeat purchases impact sales.

Task 1

Checking and Cleaning of Data

product id

a. This column had an accurate description from the given table.

b. There were no duplicates or missing values.. I used DISTINCT to check for duplicates if the numbers of row changed, and IS NULL for checking Null Values

category

a. The data is Nominal like the description table said but there is another value besides the 6 given category

b. Checking the table using the NOT IN function in SQL to see any values outside the given category, I found 25 rows with "-". I also inserted "_" in the NOT IN to see if there are any other different value or mispelling of the 6 category, but found nothing else.

c. According to the description, missing values should be replaced by "unknown" not "-", so using the UPDATE and REPLACE function in SQL, I replaced all "_" with "Unknown":

UPDATE dbo.pet_supp SET category = REPLACE(category, '-', 'Unknown') WHERE category = '-'

animal

a. The data is Nominal like the description table said.

b. Using the NOT IN function to check for values not under Dog, Cat, Fish, Bird, it returned with 0 rows. There were no missing values :

SELECT animal FROM dbo.pet_supp WHERE animal NOT IN ('Dog', 'Cat', 'Fish', 'Bird')

size

a. The data was accurate to the description, it is ordinal. There was also no extra values besides the 3 given in the description (Small, Medium, Large).

b. Querying using SELECT only I could already see inconsistencies with the capitalization that needs to be fixed. Checking with NOT IN for extra or missing values came back with 0 rows

c. I used UPDATE function along with the LOWER function to update the column and make every value lowercase.

UPDATE dbo.pet_supplies SET size = LOWER(size)

price

a. The price is contiuous, although a lot were not in 2 decimasl places. This might just be because of SQL since opneing it in excel it is in 2 decimal places

b. Using the IS NULL I found --Rows with Null values.

c. I started with getting the median using percentile_cont because according to the description any null value should be replaced with the median.

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY price) OVER() FROM dbo.pet_supp WHERE price IS NOT NULL -- the median is 28.0649995803833

after getting the median, I used UPDATE to change null values into 28.0649995803833, and then used UPDATE agian to ROUND() all values in the column

`-- updating the Null values in price column with the median UPDATE dbo.pet_supp SET price = 28.0649995803833 WHERE price IS NULL

-- acording to the discription it should also be rounded into 2 decimal places SELECT ROUND(price, 2) FROM dbo.pet_supp`

sales

a. The description is accurate for the sales table alhtough some values are not rounded in 2 decimal places.

b. Using IS NULL, I found no null values.

c. I updated the sales column with the ROUND() function to put the values in 2 decimal places.

UPDATE dbo.pet_supp SET sales = ROUND(sales, 2)

rating

a. The description is accurate, although there were NULL values

b. Using IS NULL, I found -- null values. I also used checked any values outside the 1-10 rating with the NOT IN and found no value exceeding the ratings.

c. According to the discription missing replaced with 0, so I used update to change Null values to 0

UPDATE dbo.pet_supp SET rating = 0 WHERE rating IS NULL

repeat_purchase

a. This column had an accurate description from the given table.

b. I checked fo missing or incorrect values using IS NULL and NOT NULL in the WHERE clause and found 0 null values

Task 2

How Many Products are Repeat Purchases?

According to the visualization 60% or 906 of 1500 of the product_id recorded in the last year are repeat purchases. This is likely the effect of testing the approach of selling more everyday products repeatedly by PetMind.

a. The category with the most repeat purchase is equipment with 221 observation. it also has the overall count of purchase.

The average count of repeat purchase across category, not including 'Unknown' is 148.67 and a Mean Absolute Deviation of 27.46

The median count of repeat purchase across category is also not including 'Unknown' is 151.5

b. The observation is mostly balanced across category with two categories creating the imbalance. Minus the 'Unknown' 4 of the 6 categories are very close to the mean and median. The two other categories; equipment with 221 observation and accesories with 70 observation for repeat purchases are the only two distant from the other 4 values creating a bigger mean absolute deviation.

Task 3

What is the Distribution of All Sales?

Main Visualization:

Looking at the sales by category:

As stated in task 2 plus the visualizations here looking at the sum of all sales, equipment is the most bought(370) and most total sales(349K) category for pet supplies, however looking at the average Toys come out as top as equipment drops to 3rd likely due to Toys having the second highest average price of 32.26 compared to equipment's 26.96.

This being said does not mean we should focus exclusively on toys as looking at the count, equipment are the most popular products by a big margin, causing it to top the total amount of sales. One idea could be a bundle of an equipment and a toy, where we could leverage the popularity of equipment products to sell a more expensive product like toys bundle to maximize sales.

Looking at the sales by animal:

I also looked at the sales by animal, and it is a similar sitauation with the sales by category. Products for Cat are the most bought(567) and brings the most sales(574K), but by average sales Birds come out on top, similar to Toys, likely because of the high average price of 40.78 compared to Cat's 27.65.

Unlike in the categories visualization however, where the toys was always near in sales and count with equipment, Bird products are behind the count and sum of sales, and behind Cat products by a big margin in those areas. Looking at the visualization overall Cats are the most popular pet for PetMind customer, while Bird products toping the average is likely because of the high cost of taking care of birds.

The sales for Cat products should be maintained, but perhaps maximizing sales for Dog products could be a a good secondary priority. Dog Products has the second most sum of sales with 390,000~ although still way behind Cat products, and the second highest average sales(1058) behind birds also due to its high average price(31.86), second to birds.

Task 4

Is There a Relationship Between Repeat Purchase and Sales

Main Visualization

This Visualization looks at the relationship between repeat purchase and sales to see if PetMind's approach of selling more everyday products repeatedly had an impact last year. A quick glance and we can see that repeat purchase lead the total amount of sales in both category of products or pet animal. Considering this was the first year of this approach, continuing the previous year's approach is likely to increase the gap between repeat and non-repeat purchases.