Skip to content

Data Analyst Associate Certification

Task 1 - clean data, find null values, match to description:

Values for claim_id match the description, there are no missing values.

Values for time_to_close match the description. There are no negative values. There are no null values.

Values for claim_amount are in the currency of Brazil. The 'R$' has been removed and values were converted from data type 'string' to data type 'float' then rounded to two decimal places to match the description. There are no null values.

Values for amount_paid were converted from data type 'string' to data type 'float' then rounded to two decimal places to match the description. There were 36 counts of missing data. These values have been replaced with the median amount_paid.

Values for location match the description. There are no typos. There are no null values.

Values for individuals_on_claim match the description. There are no missing values.

Values for linked_cases contained 26 counts of missing data. These missing values have been replaced by 'FALSE' and now match the description.

Values for cause must be one of "vegetable", "meat", or "unknown". There were 14 cases where the values were "Meat" and 16 cases where the values were "VEGETABLES". They have been replaced with the appropriate corresponding values and now match the description. There are no null values.

Task 2 - find which location has the most claims, if claims are balanced across locations:

Location 'RECIFE' has the most claims with a count of 885.

The number of claims are not balanced across locations. 'RECIFE' has the most claims at 885. 'SAO LUIS' has the second most claims at 517. 'FORTALEZA' has the third most claims at 311. 'NATAL' has the least claims at 287.

Task 3 - describe the distribution of time to close for all claims:

The distribution of time-to-close for all claims maintains a positively skewed distribution with a mean of 185.568, a median of 179.0, and a standard deviation of 49.163. The majority of all claims get resolved within about 6 months.

Task 4 - describe relationship between time to close and location:

The time-to-close between locations are relatively balanced despite varying number of claims. The overall average time-to-close is 185.56 days.

For 'FORTALEZA' which holds 15.55% of total claims (311), about 43% of these claims take longer than the overall average time-to-close. It has a minimum time-to-close of 76 days and a maximum of 453 days. The first quartile, median, and third quartiles are 157.0, 180.0 and 205.5 days, respectively.

For 'NATAL' which holds 14.35% of total claims (287), about 38% of these claims take longer than the overall average time-to-close. It has a minimum time-to-close of 93 days and a maximum of 361 days. Natal has the lowest maximum time-to-close. The first quartile, median, and third quartiles are 157.0, 179.0 and 205.5 days, respectively.

For 'RECIFE' which holds 44.25% of total claims (885), about 38% of these claims take longer than the overall average time-to-close. It has a minimum time-to-close of 82 days and a maximum of 427 days. The first quartile, median, and third quartiles are 157.0, 178.0 and 203.0 days, respectively.

For 'SAO LUIS' which holds 25.85% of total claims (517), about 39% of these claims take longer than the overall average time-to-close. It has a minimum time-to-close of 84 days and a maximum of 518 days. Sao Luis has the highest maximum time-to-close. The first quartile, median, and third quartiles are 161.0, 179.0 and 205.0 days, respectively.

The average time-to-close is roughly the same for each location:

RECIFE = 184.61 days

FORTALEZA = 185.31 days

NATAL = 185.93 days

SAO LUIS = 187.17 days