Skip to content

Data Analyst Associate Practical Exam Submission

You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.

Background

PetMind is a retailer of products for pets. They are based in the United States. PetMind sells products that are a mix of luxury items and everyday items. Luxury items include toys. Everyday items include food. The company wants to increase sales by selling more everyday products repeatedly. They have been testing this approach for the last year. They now want a report on how repeat purchases impact sales.

Dataset

  • product_id: Nominal. The unique identifier of the product. Missing values are not possible due to the database structure.
  • category: Nominal. The category of the product, one of 6 values (Housing,Food, Toys, Equipment, Medicine, Accessory).
  • animal: Nominal. The type of animal the product is for. One of Dog, Cat,Fish, Bird. Missing values should be replaced with “Unknown”.
  • size: Ordinal. The size of animal the product is for. Small, Medium,Large. Missing values should be replaced with “Unknown”.
  • price: Continuous. The price the product is sold at. Can be any positive value, round to 2 decimal places.Missing values should be replaced with the overall median price.
  • sales: Continuous. The value of all sales of the product in the last year.This can be any positive value, rounded to 2 decimal places. Missing values should be replaced with the overall median sales
  • rating: Discrete. Customer rating of the product from 1 to 10.Missing values should be replaced with 0.
  • repeat_purchase: Nominal. Whether customers repeatedly buy the product (1) or not (0). Missing values should be removed.

Tasks

Submit your answers directly in the workspace provided.

  1. For every column in the data:
  • State whether the values match the description given in the table above.
  • State the number of missing values in the column.
  • Describe what you did to make values match the description if they did not match.
  1. Create a visualization that shows how many products are repeat purchases. Use the visualization to:
  • State which category of the variable repeat purchases has the most observations
  • Explain whether the observations are balanced across categories of the variable repeat purchases
  1. Describe the distribution of all of the sales. Your answer must include a visualization that shows the distribution.

  2. Describe the relationship between repeat purchases and sales. Your answer must include a visualization to demonstrate the relationship

Task 1

For every column in the data:

  • State whether the values match the description given in the table above.
  • State the number of missing values in the column.
  • Describe what you did to make values match the description if they did not match.

Overview

There are some columns that have missing values in the following format:

  • Twenty five (25) null values in category in the form of '-'
  • 'unlisted' found in price column
  • One hundred and fifty (150) null values in rating

Additionally, the size column has a mix of uppercase and lowercase

Hidden code
## there are 1500 rows and 8 columns
df.shape
Hidden code
  • No null values in product_id