Skip to content

Data Analyst Associate Practical Exam Submission

You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.

Task 1

For every column in the data:

  • State whether the values match the description given in the table above.
  • State the number of missing values in the column.
  • Describe what you did to make values match the description if they did not match.

SOLUTION

claim_id: There are 2000 unique values in this column, which match the description given. There are no missing values. No changes were made to this column.

time_to_close: The values of this column range from 76 to 518, which is consistent with the description given. There are no missing values. No changes were made to this column.

claim_amount: This values of this column were all rounded to 2 decimal places with the minimum value of 1637.94 and maximum value of 76106.80. There are no missing values. Rather, I removed the Brazil currency symbol, 'R$' and converted the data type from text to decimal in order to match the description given.

amount_paid: The values of this column ranged from 1516.72 to 52498.75 and were all rounded to 2 decimal places. There are 36 missing values. The missing values were replaced with the median value of the remaining data, which was 20105.70 and the data type was converted from text to decimal to make the values match the description.

location: This column has four categories that match those in the given description. There are no missing values and no changes were made to this column.

individuals_on_claim: There are 15 distinct values in this column ranging from 1 to 15, which is consistent with the description given. There are no missing values. No changes were made to this column.

linked_cases: All the values in this column were either TRUE, FALSE, or missing. There are 26 missing values. All missing values were replaced with FALSE.

cause: This column has three distinct categories, ‘meat’, ‘unknown’, and ‘vegetable’, which is consistent with those stated in the description. There are no missing values. However, it had two additional values that does not match the description, ‘Meat’ and ‘VEGETABLE’. These values were renamed as meat and vegetable respectively to match the description.

Task 2

Create a visualization that shows the number of claims in each location. Use the visualization to:

  • State which category of the variable location has the most observations
  • Explain whether the observations are balanced across categories of the variable location

SOLUTION

There are four locations included in this data. The location with the most observation/claims is a RECIFE, with SAO LUIS being second, FORTALEZA and NATAL following in this order. The claims are unbalanced, with most observations being in either RECIFE or SAO LUIS. The legal team should focus on replying and closing cases faster in RECIFE since they have the most claims.

Task 3

Describe the distribution of time to close for all claims. Your answer must include a visualization that shows the distribution.

SOLUTION

Since the head of the legal department wants to see how each location differs in the time it takes to close claims, we should look at how time to close (number of days), is distributed for all claims. Looking at all claims, we can see from the chart above that most claims are closed between 100 and 300 days. The distribution of the time to close claims is right-skewed. There are some outliers that take more than 300 days to close claims although this is very uncommon. To improve the time taken to reply customers and close claims, the legal team across these four locations should focus on claims that take less than 300 days to close, but they should also take into consideration the claims that take more than 300 days to close.

Task 4

Describe the relationship between time to close and location. Your answer must include a visualization to demonstrate the relationship.

SOLUTION

Although, RECIFE is the location with the largest number of claims, the interquartile range for the time taken to close claims is about the same with the others. This would suggest that the mean value would be in the same range, although the mean value for SAO LUIS is more than the others since it has more values above 400. However, this chart shows that the time to close claims are in the same range (0-400) across all locations except for few outliers, where claims were above 400, three from SAO LUIS two from FORTALEZA, one from RECIFE, and none from NATAL. Based on chart above, we cannot really say if location impacts the time to close claims, but we would recommend that the legal team focus on RECIFE with most claims within the range (0-400), but also keep an open mind to including other locations. Further analysis should be done to understand if location of the legal team's office really does impact the time taken to close claims.

✅ When you have finished...

  • Publish your Workspace using the option on the left
  • Check the published version of your report:
    • Can you see everything you want us to grade?
    • Are all the graphics visible?
  • Review the grading rubric. Have you included everything that will be graded?
  • Head back to the Certification Dashboard to submit your practical exam