Skip to content
# Start coding here... 
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.linear_model import LinearRegression

Data Validation

df = pd.read_csv("https://s3.amazonaws.com/talent-assets.datacamp.com/toyota.csv")
df.head()

Check for data quality?

df.isna().sum()
df.info()

Convert year to date time and convert some columns to categorical



cat_cols = ["model", "transmission", "fuelType", "engineSize"]
df[cat_cols] = df[cat_cols].astype('category')
df.info(), df.nunique()
df.transmission.value_counts()
df.model.value_counts()
df.fuelType.value_counts()
df.engineSize.value_counts()
df.describe()

Exploratory Data Analysis

How is our target distributed?