Skip to content

Data Analyst Associate Practical Exam Submission

You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.

Task 1

Let's analyze the values of the fields in the table:

  • claim_id: Nominal. It is the unique identifier. Missing values are not possible due to the structure of the database.
  • time_to_close: Discreet. The number of days it takes to close the claim. Any positive value. Replaces missing values with the average AVG(time_to_close). In this case, we do not have any missing value.
  • claim_amount: Continuous. The initial claim is in the currency of Brazil, rounded to two decimal places. Replaces missing values with the average AVG(time_to_close). In this case, we do not have any missing value.
  • amount_paid: Continuous. The money finally paid for the claim in the currency of Brazil, rounded to two decimal places. Replaces missing values with the average AVG(time_to_close). In this case, we have 36 missing values, which were replaced by the mean value which is 27683.28
  • location: Nominal. Location of the claim, one in "RECIFE", "SAO LOUIS","FORTALEZA" or "NATAL". Remove missing values. We have not had to remove any missing values.
  • individuals_on_claim: Discreet. Number of individuals in the demand. Minimum one person. Replace missing values with 0.
  • linked_cases: Nominal. If this claim is linked to other cases. Then it indicates "TRUE" or "FALSE". Replace missing values with "FALSE". In this case we replace 26 values.
  • cause: Nominal. Cause of food infection. One for "vegetable", "meat" or "unknown". Replace missing values with "unknown". In this case we replace 16 "VEGETABLE" values with "vegetable" and 14 " Meat" values with "meat".

Before uploading the .csv file, I've proceeded to improve the quality of the data by performing the following steps:

  1. Calculate AVG(amount_paid) to replace the 36 NA values with their average.

  1. Replace the 26 "NA" values with "FALSE" from the linked_cases field

  1. Replace the 16 "VEGETABLES" values with "vegetable" from the cause field.

  1. Replace all 14 " Meat" values with "meat" from the cause field.

Task 2

Now I've created a visualization to show the number of claims in each location, to demonstrate: a. Which category of the location variable has more observations. b. Explain if the observations are balanced between the categories of the location variable.

To show the number of claims for each global location, I have used the puck and bar graph. We can see that 44% of the claims appear in RECIFE with a total sum of 885 claims out of a total of 2,000 claims.

To show the category of the location variable that has the most observations, I have chosen the heat map and bar graph. We can see that the variable with the most complaints is the "RECIFE" location, both in "meat" and "unknown", while the variable with the fewest complaints is the "vegetable" cause.

Task 3

The distribution of the time it takes to close the claims is a unimodal distribution, with the peak in segment 2, with 1433 records. To create the segments I have separated them every 100 time units in time_to_close, creating 6 segments:

We could indicate that in segment 2 with 1433, 72% of the claims are closed and the next 24% are closed in segment 3, leaving the remaining 4% between segments 1,4,5 and 6.

Task 4

The relationship between the location and the closing time can be observed in a bar graph and a box-plot. In Fortaleza with 311 claims, we have an average of 185.30 time_to_close In Natal with 287 claims, we have an average of 185.92 time_to_close In Sao Louis with 517 claims, we have an average of 184.60 time_to_close In Recife with 1433 claims, we have an average of 187.17 time_to_close

With these values we can determine that the average for closing claims is similar, but there are more outliers above in Sao Louis, Recife, Fortaleza and Natal, in that order, reflecting in Sao Louis the outliers furthest from the mean of the data set. . The claims between the values 157-206 in Fortaleza and Natal, 157-203 in Recife and 161-206 in Sao Louis are closed around the average.

✅ When you have finished...

  • Publish your Workspace using the option on the left
  • Check the published version of your report:
    • Can you see everything you want us to grade?
    • Are all the graphics visible?
  • Review the grading rubric. Have you included everything that will be graded?
  • Head back to the Certification Dashboard to submit your practical exam