Skip to content

Practical Exam: Pet Box Subscription

Instructions

  • Use any tools that you are comfortable with to perform the tasks required (for example Tableau, Power BI, MS Excel, Python, R).
  • Write your solutions in the workspace provided from your certification page.
  • Include all of the visualizations you create to complete the tasks.
  • Visualizations must be visible in the published version of the workspace. Links to external visualizations will not be accepted.
  • You do not need to include any code.
  • You must pass all criteria to pass this exam. The full criteria can be found

You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.

Case Study

Background

PetMind is a retailer of products for pets. They are based in the United States. PetMind sells products that are a mix of luxury items and everyday items. Luxury items include toys. Everyday items include food. The company wants to increase sales by selling more everyday products repeatedly. They have been testing this approach for the last year. They now want a report on how repeat purchases impact sales.

Data

The dataset contains the sales records in the stores last year. The dataset can be downloaded from here.

Tasks

Submit your answers directly in the workspace provided.

  1. For every column in the data: a. State whether the values match the description given in the table above. b. State the number of missing values in the column. c. Describe what you did to make values match the description if they did not match.
  2. Create a visualization that shows how many products are repeat purchases. Use the visualization to: a. State which category of the variable repeat purchases has the most observations b. Explain whether the observations are balanced across categories of the variable repeat purchases
  3. Describe the distribution of all of the sales. Your answer must include a visualization that shows the distribution.
  4. Describe the relationship between repeat purchases and sales. Your answer must include a visualization to demonstrate the relationship

Task 1: Data Validation and Description

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
petbox_data = pd.read_csv('pet_supplies_2212.csv')

petbox_data.head()
petbox_data.info()
Hidden output
petbox_data.isna().sum()
Hidden output
petbox_data['rating'] = petbox_data['rating'].fillna(0)
petbox_data.isna().sum()
Hidden output
petbox_data['size'] = petbox_data['size'].str.lower()
petbox_data.describe()
Hidden output
productids_count = petbox_data['product_id'].nunique()


category_list = petbox_data['category'].unique()
category_count = petbox_data['category'].nunique()

animal_list = petbox_data['animal'].unique()
animal_count = petbox_data['animal'].nunique()

size_list = petbox_data['size'].unique()
size_count = petbox_data['size'].nunique()

print(productids_count)

print(category_count, [category_list])
print(animal_count, [animal_list])
print(size_count, [size_list])
Hidden output
petbox_data['category'] = petbox_data['category'].replace('-', "Unknown")
category_list = petbox_data['category'].unique()

print(category_list)
Hidden output
petbox_data['sales'] = round(petbox_data['sales'], 2)

petbox_data.head()
Hidden output
petbox_data['price'] = pd.to_numeric(petbox_data['price'], errors='coerce')
petbox_data['price'] = round(petbox_data['price'], 2)

petbox_data.head()

petbox_data.isna().sum()
Hidden output