Skip to content
New Workbook
Sign up
Duplicate of Certification - Data Analyst Associate - Pet Supplies

Data Analyst Associate Practical Exam Submission

You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.

Company Background

PetMind is a retailer of products for pets. They are based in the United States. PetMind sells products that are a mix of luxury items and everyday items. Luxury items include toys. Everyday items include food.

Company questions

  • The company wants to increase sales by selling more everyday products repeatedly.
  • They have been testing this approach for the last year. They now want a report on how repeat purchases impact sales.

Importing Modules

# import libraries
import pandas as pd

1: Loading and Assessing repeat purchases Table

# Loading data from url
url='https://s3.amazonaws.com/talent-assets.datacamp.com/pet_supplies_2212.csv'
#read csv file at url and save as raw_data
repeat=pd.read_csv(url,sep=',')

# Table preview and shape
print('\nthe head of data as below:')
print(repeat.shape)
repeat.head()
# check the row_data info
repeat.info()
# Checking for duplicates
repeat.duplicated().sum()
# Checking for empty values
repeat.isnull().sum()
# Checking unique values of variables
print(len(repeat['product_id'].unique()))
print(repeat['category'].unique())
print(repeat['animal'].unique())
print(repeat['size'].unique())
print(repeat['rating'].unique())
print(repeat['repeat_purchase'].unique())
# data statistical check for categorical data type (object) columns
repeat.describe(include='object')
# data statistical check for numerical data type (int, float64) columns
repeat.describe(include='number')

2. Data issues and Cleaning Data

Data Issues

Tidiness Issue

  1. Valuable should have same format : size column should have same capatilized string.

Quality issue

  1. Missing values in the variable: category, rating
  2. Price variable data type convert from string to float64, price

#check the raw_data description
#display the category columns
print('\n the category data statastic summary as below:\n')
print(repeat.describe(include='object'))
print('\n the numerical data statastic summary as below:\n')
print(repeat.describe(include='number'))

#backup the raw data to make sure to keep raw_data intergrity.
df=repeat.copy()
# making product_id as an Index
df.set_index('product_id')