Skip to content
Associate Data Analyst Case Study Project - With Python
  • AI Chat
  • Code
  • Report
  • DataCamp Associate Data Analyst Case Study Project - Food Claims Process

    BY ABDULRAHEEM BASHIR

    Table of Contents

    • Introduction
    • Importing required files and libraries
    • Data Inspection
    • Data Cleaning
    • Sanity Check After Data Cleaning
    • Data Exploration
    • Further Data Exploration
    • Conclusions

    This case study is about a fast food restaurant in Brazil where consumers file claims against such as food poisoning. Vivendo fast food is the name of the fast food to be used in this case study.

    Vivendo is a fast food chain in Brazil with over 200 outlets. As with many fast food establishments, customers make claims against the company. For example, they blame Vivendo for suspected food poisoning.

    The legal team, who processes these claims, is currently split across four locations. The new head of the legal department wants to see if there are differences in the time it takes to close claims across the locations.

    Customer Question: The legal team would like you to answer the following questions:

    • How does the number of claims differ across locations?
    • What is the distribution of time to close claims?
    • How does the average time to close claims differ by location?

    Dataset: The dataset contains one row for each claim. The dataset can be downloaded from here.

    The following are the dataset descriptions:

    • Claim ID: Character, the unique identifier of the claim.
    • Time to Close: Numeric, number of days it took for the claim to be closed.
    • Claim Amount: Numeric, initial claim value in the currency of Brazil.
    • Amount Paid: Numeric, total amount paid after the claim closed in the currency of Brazil.
    • Location: Character, location of the claim, one of “RECIFE”, “SAO LUIS”, “FORTALEZA”, or “NATAL”.
    • Individuals on Claim: Numeric, number of individuals on this claim.
    • Linked Cases: Binary, whether this claim is believed to be linked with other cases, either TRUE or FALSE.
    • Cause: Character, the cause of the food poisoning injuries, one of ‘vegetable’, ‘meat’, or ‘unknown’.

    Importing required files and libraries

    # import all packages and set plots to be embedded inline
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    %matplotlib inline
    # Reading the csv file
    # saving it as a dataframe with the name claims
    
    claims = pd.read_csv('claims.csv')

    Data Inspection

    In this section, The data will be checked for quality and tidiness issues.

    # displaying few top rows from the dataframe
    
    claims.head()

    The preceding output shows that:

    • The Claim ID column contains some undesired zeros. It also includes two pieces of information in this single column: the Claim ID and the Year of Claim.
    • Some unwanted characters appear before the amount in the Claim Amount column.
    • The Cause column has some empty values.
    # displaying some information about the dataframe
    
    claims.info()

    The preceding output shows that:

    • The datatype for the Claim Amount column is not accurate.
    • Approximately 80% of the Cause column entries are null.
    # Checking for the count of duplicates in the dataframe
    claims.duplicated().sum()

    It appears above that there is no duplicate in the dataframe.

    # displaying some descriptive statistic abput the data
    
    claims.describe()

    According to the above output, the minimal time to close a claim is -57, which is unusual because there are no negative days in real life.