Skip to content

Data Wrangling

This is a data wrangling practice workbook with a dataset from the Coursera IBM Data Analyst Course.

Import Basic Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Ignore Future Warnings

import warnings 
warnings.filterwarnings("ignore", category = FutureWarning)

Import the Data from the URL below

filepath = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-SkillsNetwork/labs/Data%20files/auto.csv"

Create Dataframe df and view top 5 rows

df = pd.read_csv(filepath, header=None)
df.head()

Assign Column Headers

headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-of-doors","body-style",
         "drive-wheels","engine-location","wheel-base", "length","width","height","curb-weight","engine-type",
         "num-of-cylinders", "engine-size","fuel-system","bore","stroke","compression-ratio","horsepower",
         "peak-rpm","city-mpg","highway-mpg","price"]
df.columns = headers
df.head()

Identify and Handle Missing Values

df = df.replace("?", np.nan)
df.head()
missing_data = df.isnull().sum()
missing_data = missing_data[missing_data>0]
missing_data = pd.DataFrame(missing_data, columns=["Missing Values"]).reset_index()
missing_data = missing_data.rename(columns = {"index":"Column Name"})
missing_data

Check Data Types