Skip to content

Practical exam

Firts let's import pandas, read the dataset and view it

import pandas as pd
import numpy as np

df = pd.read_csv("electric_bike_ratings_2212.csv")

df.head()

Let's check the data

df.dtypes

"reviewer_age" is object type and it must be integer

df.reviewer_age.unique()

it has a "-" so first calculate the average

average_age = df[df["reviewer_age"] != "-"]["reviewer_age"].astype("int32").mean()

print(average_age)

finally replace the "-" value and convert the column into int

df["reviewer_age"] = df["reviewer_age"].replace("-",average_age).astype("int32")

Here we can see how many missing data there are

df.isna().sum()

There are 150 missing values in "web_browser" column, so we will replace it with "unknown"

df["web_browser"] = df["web_browser"].fillna("unknown") 
#let's see again the missing data
df.isna().sum()

We need to check the "review_month" column because we could see that it should be cleaned