Skip to content

My first task is to open and inspect my data.

import pandas as pd 
import matplotlib.pyplot as pyplot
import numpy as np
food = pd.read_csv('food_claims_2212.csv')

food.head()

Next I check to see if I am missing any data. I use the .isnull() function and discover I have to columns with missing data. I am instructed to replace amount paid nulls with the median vale, and the linked_cases to False

food.isnull().sum()
food['amount_paid'].median()
food["amount_paid"].fillna("20105",inplace = True)
food.isnull().sum()
food["linked_cases"].fillna("FALSE", inplace = True)
food.isnull().sum()

next I will seperate claim amout at the R$ and formant it correctly.

food['claim_amount'].str.lstrip(2)