Skip to content
DS Professional Cert Prep
# Start coding here...
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.linear_model import LinearRegression
Data Validation
df = pd.read_csv("https://s3.amazonaws.com/talent-assets.datacamp.com/toyota.csv")
df.head()
Check for data quality?
df.isna().sum()
df.info()
Convert year to date time and convert some columns to categorical
cat_cols = ["model", "transmission", "fuelType", "engineSize"]
df[cat_cols] = df[cat_cols].astype('category')
df.info(), df.nunique()
df.transmission.value_counts()
df.model.value_counts()
df.fuelType.value_counts()
df.engineSize.value_counts()
df.describe()
Exploratory Data Analysis
How is our target distributed?