Skip to content

Task 1

From taking a quick look at the data, you are pretty certain it isn't quite as it should be. You need to make sure all of the data is clean before you start your analysis. The table below shows what the data should look like.

Write a query to return a table that matches the description provided.

Do not update the original table.

Column NameCriteria
product_idNominal. The unique identifier of the product.
Missing values are not possible due to the database structure.
categoryNominal. The category of the product, one of 6 values (Housing, Food, Toys, Equipment, Medicine, Accessory).
Missing values should be replaced with “Unknown”.
animalNominal. The type of animal the product is for. One of Dog, Cat, Fish, Bird.
Missing values should be replaced with “Unknown”.
sizeOrdinal. The size of animal the product is for. Small, Medium, Large.
Missing values should be replaced with “Unknown”.
priceContinuous. The price the product is sold at. Can be any positive value, round to 2 decimal places.
Missing values should be replaced with 0.
salesContinuous. The value of all sales of the product in the last year. This can be any positive value, rounded to 2 decimal places.
Missing values should be replaced with the overall median sales.
ratingDiscrete. Customer rating of the product from 1 to 10.
Missing values should be replaced with 0.
repeat_purchaseNominal. Whether customers repeatedly buy the product (1) or not (0).
Missing values should be removed.
Spinner
DataFrameas
clean_data
variable

SELECT 
    product_id AS product_id,
    COALESCE(NULLIF(category, '-'), 'Unknown') AS category,  
    COALESCE(NULLIF(animal, ''), 'Unknown') AS animal,       
    COALESCE(INITCAP(NULLIF(size, '')), 'Unknown') AS size,  
    COALESCE(
        CASE 
            WHEN price ~ '^[0-9]+(\.[0-9]*)?$' THEN ROUND(price::NUMERIC, 2)
            ELSE 0 
        END,  
        0
    ) AS price,
	COALESCE(
        ROUND(sales::NUMERIC, 2), 
        COALESCE(
            (SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sales) 
             FROM public.pet_supplies 
             WHERE sales IS NOT NULL),
            0  -- Ensure sales defaults to 0 if median is NULL
        )
    ) AS sales,
    COALESCE(rating, 0) AS rating,  
    repeat_purchase  
FROM public.pet_supplies
WHERE repeat_purchase IS NOT NULL;

Task 2

You want to show whether sales are higher for repeat purchases for different animals. You also want to give a range for the sales.

Write a query to return the animal, repeat_purchase indicator and the avg_sales, along with the min_sales and max_sales. All values should be rounded to whole numbers.

You should use the original pet_supplies data for this task.

Spinner
DataFrameas
animal_sales
variable
SELECT 
animal,
repeat_purchase,
ROUND(AVG(sales::NUMERIC),0) AS avg_sales,
ROUND(MIN(sales::NUMERIC),0) AS min_sales,
ROUND(MAX(sales::NUMERIC),0) AS max_sales
FROM public.pet_supplies
GROUP BY animal, repeat_purchase

Task 3

The management team want to focus on efforts in the next year on the most popular pets - cats and dogs - for products that are bought repeatedly.

Write a query to return the product_id, sales and rating for the relevant products.

You should use the original pet_supplies data for this task.

Spinner
DataFrameas
popular_pet_products
variable
SELECT product_id, sales, rating
FROM public.pet_supplies
WHERE animal IN ('Cat', 'Dog') AND repeat_purchase = 1;