Data Analyst Associate Practical Exam Submission
You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.
You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.
Task 1
Category - The values of this column were s (Housing,Food, Toys, Equipment, Medicine, Accessory). which is consistent with the description given. There were several row that did not meet the criteria, The value did not match the description above, so I decided to use Excel to clean the data. First, I applied a filter to the column. This allowed me to see which rows within the column did not meet the criteria. Then, I used the Replace function in Excel to replace the missing values with "Unknown". In total, there were 25 columns that did not meet the criteria.*
Animal - There are 4 unique values that match the description given, which were Dog, Cat,Fish, Bird. There are no missing values. No changes were made to this column.
Size- There are 3 unique values that match the description given which were . Small, Medium,Large. There are no missing values. No changes were made to this column.
Price - The values of this column were consistent with the description given All them were positive value and round to 2 decimal places. There were several rows in the column that did not meet the criteria. The values in these rows did not match the description above. I decided to use Excel to clean the data.First, I applied a filter to the column to see which rows did not meet the criteria. There were a total of 150 columns that did not meet the criteria.Then, I used the median() function in Excel to calculate the median of the column price. The median for the column was 28.065. Since we are dealing with currency I change the number type format to currency which will automatically reformate the column. Therefore all the number with the column woulb be round to two decimal place. In this situation 28.065 is rounded to 28.07. With that in mind, I used the Replace function in Excel to replace the missing values with 28.07. One thing notice was that although the decimal were rounded to two decimal places, there were rows that only had one decimal digit. To address this issue, I changed the format of the column to currency, ensuring that all rows would display two decimal places. As a result, the rows that originally had only one decimal digit now have a zero added at the second decimal position to conform to the currency format. After cleaning the data, I was able to ensure that all of the values in the column met the criteria.
Sales- The values of this column were consistent with the description given All them were positive value and round to 2 decimal places. All the rows in the sales column in the table met the criteria. But I did notice that although they were rounded to two decimal places, there were rows that only had one decimal digit. To address this issue, I changed the format of the column to currency, ensuring that all rows would display two decimal places. As a result, the rows that originally had only one decimal digit now have a zero added at the second decimal position to conform to the currency format.After cleaning the data, I was able to ensure that all of the values in the column met the criteria.
Rating- The values of this column were between 1 and 10, which is consistent with the description given. There were several row that did not meet the criteria, The value did not match the description above, so I decided to use Excel to clean the data. First, I applied a filter to the column. This allowed me to see which rows within the column did not meet the criteria. Then, I used the Replace function in Excel to replace the missing values with "0". In total, there were 150 columns that did not meet the criteria.
repeat_purchase- There are 2 unique values that match the description given which were (0-1). There are no missing values. No changes were made to this column. All the rows in the table met the criteria
A)
A)
Visual equipment has the most observations for repeat purchases, with a grand total of 221.
B)
Repeating purchase is only balanced across the food, housing, medicine, and toys categories. However, overall it is not balanced due to the significant difference when comparing the category mention with the remaining data accessory, equipment, and unknown categories
Task 3
The distribution of sales is right-skewed, with a majority of sales falling in the lower range and a long tail towards higher sales values. This is evident in the histogram, which shows that the highest peak occurs in the bin range of roughly 979.94 to 1078.94. This indicates that a significant number of sales are concentrated in this range. As we move past this concentrated range, the frequency gradually decreases. This suggests that a majority of the products do not have higher sales figures.
- The data indicates that for non-repeating sales, the median is $1030.
- Additionally, the data reveals that for repeating sales, the median is $977.
- Analyzing the box plot without considering the outlier, we can safely assume that non-repeat sales tend to occur within the range of 288 to 1795, while repeat sales tend to occur within the range of 287 to 1567.
- Based on the graph, it can be observed that 75% of non-repeat sales are greater than or equal to $795 , and 75 % of repeat sales are greater than or equal to 737 Dollars.
- Further analysis should be conducted to determine which specific product is being sold and how it is impacting sales repetition.
✅ When you have finished...
- Publish your Workspace using the option on the left
- Check the published version of your report:
- Can you see everything you want us to grade?
- Are all the graphics visible?
- Review the grading rubric. Have you included everything that will be graded?
- Head back to the Certification Dashboard to submit your practical exam