Skip to content

Food Claims Process Case Study

Company Background

The company was a fast food chain in Brazil with over 200 outlets. As with many fast food establishments, customers make claims against the company.

For example, they blame the fast food chain for suspected food poisoning.

The legal team, who processes these claims, is currently split across four locations.

In this notebook we see if there are differences in the time it takes to close claims across the locations.

Customer Questions:

1. How does the number of claims differ across locations?

2. What is the distribution of time to close claims?

3. How does the average time to close claims differ by location?

The Dataset

The dataset contains one row for each claim.

Column NameCriteria
Claim IDCharacter, the unique identifier of the claim.
Time to CloseNumeric, number of days it took for the claim to be closed.
Claim AmountNumeric, initial claim value in the currency of Brazil. For example, “R$50,000.00” should be converted into 50000.
Amount PaidNumeric, total amount paid after the claim closed in the currency of Brazil.
LocationCharacter, location of the claim, one of “RECIFE”, “SAO LUIS”, “FORTALEZA”, or “NATAL”.
Individuals on ClaimNumeric, number of individuals on this claim.
Linked CasesBinary/Boolean, whether this claim is believed to be linked with other cases, either TRUE or FALSE.
CauseCharacter, the cause of the food poisoning injuries, one of ‘vegetable’, ‘meat’, or ‘unknown’. Replace any empty rows with ‘unknown’

Data Validation Summary:

After the data was loaded, we used methods to get an overview of the the columns, types of values within those columns, the number of null values (missing data) in those columns (if any) as well as some standard statistics to get a numerical idea of the data.

Two columns needed cleaning to ensure that they could be used later on in the analysis.

The values in the column "Claim Amount" were stored as strings with leading "R$" to mean Brazilian Real. These values were changed to float numbers and the leading currency denominations were removed.

The second column that needed to be changed was the "Cause" column which had NaN values where the cause of the claim was unknown. These values were changed to the string "unknown" so that it was clearer what was being implied by a missing value as well as allow data visualization of the column further in the analysis if necessary.

# import required packages
import numpy as np
import pandas as pd
import seaborn as sns
claims = pd.read_csv('claims.csv') # Load in the csv into a dataframe: claims
claims.info() # Overview of number of columns, column names, rows, null values, and data types
claims.describe() # Numerical overview of the data with standard statistics i.e mean, std etc
claims.head() #Look at the first 5 rows of the dataframe

N.B. Claim Amount has string values, this needs to be changed to allow for analysis and visualization.

# change claim amount to float
claims['Claim Amount'] = claims['Claim Amount'].str.replace("R\$|,", '', regex=True) #remove leading R$
claims.head() # take a look at the first five rows to check
claims["Claim Amount"] = claims['Claim Amount'].astype(float) #coerce to float values
claims.dtypes # check if type has changed to float
claims.head()
claims['Cause'][claims['Cause'].isnull()] = 'unknown' # change nan values in Cause column to 'unknown'
claims.head()