Skip to content

Data Analyst Associate Practical Exam Submission

You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.

Task 1

product_id: This column has 1500 unique values ranging from 1 to 1500, all the values match the description given in the data set. There are no missing values in this column. No changes are made to this column.

category: This column has 25 missing values with '-', the rest of the values match the description provided in the data set and constitute one of 6 values (Accessory, Equipment, Food, Housing, Medicine, Toys). The missing values have been replaced with ‘Unknown’.

animal: There are no missing values in this column. The values match the description given in the data set and contain one of 4 values (bird, cat, dog, fish). No change has been made to this field.

size: This column has no missing values. However, there is a discrepancy in the letter case, where the values are in both upper and lower case for the same given size value. Eg: For the size - ‘medium’ it has both ‘MEDIUM’ and ‘medium’. All the values are made consistent to reflect the values mentioned in the description (Small, Medium, Large).

price: This column has 150 missing values named ‘unlisted’ contrary to the description mentioned in the table given, the price must be a continuous numeric value rounded off to two decimal places. Hence the records that have the price values as ‘unlisted’ have been replaced with the overall median price and rounded off to two decimal places which is 28.07.

sales: This column has values in the range of 200 and 2300 upto two decimal places. The values match the description given in the data set. No missing values found in this column. Hence no change is made to this column.

rating: This column has 150 missing values with value ‘NA’. Rest of the values match the description given in the table and have values ranging from 1 to 10. The missing values have been replaced with 0.

repeat_purchase: This field has no missing values, takes values 0 or 1. The values match the description given in the table. No changes are made to this column.

Task 2

The bar chart below depicts each product category against its respective repeat purchase.

Following conclusions can be drawn from the above chart:

  • The category 'Equipment' has the most repeat purchases with value 221.
  • The categories are not balanced, 'Equipment' having the highest repeat purchases, 'Unknown' being the lowest with value 14, the categories 'Medicine', 'Housing', 'Food' have almost equal repeat purchases with an average around 152 and 'Toys' has slightly lesser repeat purchases of 145 and lastly 'Accessory' with value 70.

Task 3

Below chart shows the distribution of all the sales for products

Following observations can be noted from the above charts:

  • Highest number of are in the range of 1000 and 1050.
  • Majority of sales happened in the range of 650 and 1100 with a dip at 900 having only 8 sales.

Task 4

Below charts show the relationship between repeat purchases and sales.

From the charts above, we can observe the following-

  • Repeat purchases make upto 60% of total sales and non repeat purchases constitute the remaining 40% of sales.
  • The category 'Equipment' has the highest rate of repeat purchases.

✅ When you have finished...

  • Publish your Workspace using the option on the left
  • Check the published version of your report:
    • Can you see everything you want us to grade?
    • Are all the graphics visible?
  • Review the grading rubric. Have you included everything that will be graded?
  • Head back to the Certification Dashboard to submit your practical exam