Skip to content
Competition - Everyone Can Learn Python Scholarship
Everyone Can Learn Python Scholarship
1. Data Overview
#Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# Import data
cars = pd.read_csv('data/co2_emissions_canada.csv')
# Overiew of data
cars.head()cars.info()cars.describe()2. Data Cleaning
From overview of the data, it would be clear to define how to prepare the data better for analysis. That can be performed by the following :
- Change names of columns to having no spaces or capital letters.
- Convert object data type to string.
- For "Transmission" need to separate number of gears in different column.
- Change Fuel Types symbols into actual types.
# change column names
cols = ['make', 'model', 'vehicle_class', 'engine_size_l', 'cylinders', 'transmission', 'fuel_type', 'fuel_cons_comb_l/100km', 'co2_emissions_g/km']
cars.columns = cols
cars.head()# convert data types from object to string
for col in cars.columns :
if cars[col].dtype == 'O':
cars[col] = cars[col].astype('string')
cars.info()#separate number of gears in separate column
cars['gears'] = cars['transmission'].str.extract('(\d+)').fillna('0').astype('int')
cars['transmission_type'] = cars['transmission'].str.extract('([A-Z]+)')
cars = cars.drop(columns=['transmission'])# Change fuel type symbols into actual types
fuel_types = {'D': 'Diesel', 'X':'Regular Gasoline', 'Z':'Premium Gasoline', 'N':'Natural Gas', 'E':'Ethanol (E85)'}
cars['fuel_type'] = cars['fuel_type'].replace(fuel_types.keys(), fuel_types.values())How about some exploratory analysis before answering our questions
# effect of different vehicle classes on CO2 emissions
sns.pairplot(cars, hue='vehicle_class', y_vars=['co2_emissions_g/km']);# effect of different fuel types classes on CO2 emissions
sns.pairplot(cars, hue='fuel_type', y_vars=['co2_emissions_g/km']);# effect of different manufacturers on CO2 emissions
sns.pairplot(cars, hue='make', y_vars=['co2_emissions_g/km']);