Skip to content
New Workbook
Sign up
Certification - Data Analyst Associate - Food Claims

Data Analyst Associate Practical Exam Submission

You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.

Task 1

For every column in the data:

  • State whether the values match the description given in the table above.

  • State the number of missing values in the column.

  • Describe what you did to make values match the description if they did not match.

claim_id: There were 2000 unique values in column. There were no missing values. No changes were made to this column.

time_to_close: The values of this column ranged from 76 to 518, which is consistent with the description given in the table. There were no missing values, so no changes were made to this column.

claim_amount: The values match the criteria given in the table with the currency of the brazilian real (R$) been attached to the values. There were no missing values, so no changes were made to this column.

amount_paid: The values match the criteria given in the table but the currency of the brazilian real (R$) were not attached to the values and it was added. There were 36 missing values in the column and the missing values were replaced with the median value of the remaining data, which was 20105.70.

location: This column has four categories, that match those description given in the table. There were no missing values and no changes were made to this column.

individuals_on_claim: The values of this column ranged between 1 and 15, which is consistent with the description given in the table. There were no missing values, so no changes were made to this column.

linked_case: The values in this column were either TRUE or FALSE. There were 26 missing values in the column. All missing values were replaced with FALSE.

cause: This column had four categories instead of three as described in the table. So, one entry (VEGETABLES) didn't match description. There were no missing values in the column. Then, the 16 unmatched (VEGETABLES) entries were changed to vegetable.

Task 2

Create a visualization that shows the number of claims in each location. Use the visualization to:

  • State which category of the variable location has the most observations
  • Explain whether the observations are balanced across categories of the variable location
Hidden code

2a. RECIFE has the most variable observation in number of claims with 885 records.

2b. The observations are not balanced across the Locations of the claims. Recife has a high number of 885 claims followed by significant drop in number of claims in Sao Luis which has a record of 517 claims. Also, there is significant drop in Fortaleza and Natal, though both location has close a number of 311 and 287 claims respectively.

Task 3

Describe the distribution of time to close for all claims. Your answer must include a visualization that shows the distribution.

Hidden code

From the histogram, the time to close for all claims reveals a distribution that exhibits relative symmetry but is notably right-skewed. This skewness indicates that the mean closure time exceeds the median closure time, implying that while a significant proportion of claims are processed within a reasonable timeframe, there exists a rightward tail in the distribution, representing claims with significantly prolonged closure times. The peak of the distribution, centered around 175 to 186 days, suggests that a substantial number of claims are typically closed within this timeframe, reflecting a common and anticipated closure duration.

However, the elongated right tail points to the presence of outliers—claims undergoing extended closure periods. These outliers contribute to the right skewness and are indicative of delays or complications in the closure of certain claims. Consequently, it calls for a closer examination of claims with extended closure times to identify the underlying reasons for delays and provide possible solutions to enhance faster closure of claims.

Task 4

Describe the relationship between time to close and location. Your answer must include a visualization to demonstrate the relationship.