Skip to content
0

Everyone Can Learn Python Scholarship

1. Data Overview

#Import libraries 
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 
%matplotlib inline 
# Import data 
cars = pd.read_csv('data/co2_emissions_canada.csv')
# Overiew of data
cars.head()
cars.info()
cars.describe()

2. Data Cleaning

From overview of the data, it would be clear to define how to prepare the data better for analysis. That can be performed by the following :
  1. Change names of columns to having no spaces or capital letters.
  2. Convert object data type to string.
  3. For "Transmission" need to separate number of gears in different column.
  4. Change Fuel Types symbols into actual types.
# change column names 
cols = ['make', 'model', 'vehicle_class', 'engine_size_l', 'cylinders', 'transmission', 'fuel_type', 'fuel_cons_comb_l/100km', 'co2_emissions_g/km']
cars.columns = cols
cars.head()
# convert data types from object to string 
for col in cars.columns : 
    if cars[col].dtype == 'O':
        cars[col] = cars[col].astype('string')
cars.info()
#separate number of gears in separate column
cars['gears'] = cars['transmission'].str.extract('(\d+)').fillna('0').astype('int')
cars['transmission_type'] = cars['transmission'].str.extract('([A-Z]+)')
cars = cars.drop(columns=['transmission'])
# Change fuel type symbols into actual types
fuel_types = {'D': 'Diesel', 'X':'Regular Gasoline', 'Z':'Premium Gasoline', 'N':'Natural Gas', 'E':'Ethanol (E85)'}
cars['fuel_type'] = cars['fuel_type'].replace(fuel_types.keys(), fuel_types.values())
How about some exploratory analysis before answering our questions
# effect of different vehicle classes on CO2 emissions
sns.pairplot(cars, hue='vehicle_class', y_vars=['co2_emissions_g/km']);
# effect of different fuel types classes on CO2 emissions
sns.pairplot(cars, hue='fuel_type', y_vars=['co2_emissions_g/km']);
# effect of different manufacturers on CO2 emissions
sns.pairplot(cars, hue='make', y_vars=['co2_emissions_g/km']);