Skip to content
0

CO2 Emissions and Bicycle Market Analysis.

1. Introduction

Hello everyone. My name is Farraz Nouval Hidayatullah. I'm a student majoring in Islamic management and finance at Imam Mohammad Ibn Saud Islamic University in Indonesia. At this notebook i will do data analysis. There are two cases with different tasks. The first case is analyzing the data of vehicle CO2 emissions in Canada using Python. And the second is analyzing the data of a bicycle chain store using SQL. And now let's begin with the first case.

Case no. 1: Vehicles CO2 Emission in Canada.

Background Story.

In this case, i'm a volunteer for a public policy advocacy organization in Canada, and my colleague asked me to help her draft recommendations for guidelines on CO2 emissions rules.

After researching emissions data for a wide range of Canadian vehicles, she would like me to investigate which vehicles produce lower emissions. So in this case, what i will do is quantitative analysis. In quantitative analysis, we will do descriptive and inference analysis.

There are several basic questions to answer to gain some insight from the data:

  1. What is the median engine size in liters?
  2. What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
  3. What is the correlation between fuel consumption and CO2 emissions?
  4. Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
  5. What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?

What is inside the data?

The dataset has multiple columns, such as:

  • "Make" - The company that manufactures the vehicle.
  • "Model" - The vehicle's model.
  • "Vehicle Class" - Vehicle class by utility, capacity, and weight.
  • "Engine Size(L)" - The engine's displacement in liters.
  • "Cylinders" - The number of cylinders.
  • "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously * variable, M = Manual, 3 - 10 = the number of gears.
  • "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
  • "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
  • "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.

What is the Structure of This Analysis?

I will divide this part into several sections: 

  1. Insights and summaries.
  2. How the Data Look Like? 
  3. Pre-analysis.
  4. Answering the Task Questions.
  5. Deep Dive Into the Data. 

Let's start with the 1st section.

1. Insights and summaries

After analyzing the data and finding some insights, we are going to summarize what we've found:

  1. Answers to the base questions:
  • The median engine size is 3.0L.
  • The average (mean) of fuel type 'X' is 10.0845751993
  • The average (mean) of fuel type 'Z' is 11.4227670206 
  • The average (mean) of fuel type 'E' is 16.8613513514
  • The average (mean) of fuel type 'D' is 8.8354285714
  • The correlation between fuel consumption and CO2 emissions is strong positive correlation.
  • The Average CO2 emissions of all vehicles: 36285.64705882353
  • The Average CO2 emissions of 2.0L engine size or less: 54039.0
  1.  The distribution of engine size is right skewed. And has several outliers.
  2. The distribution of fuel consumption is like / close to normal distribution. And has many outliers.
  3. The distribution of CO2 emission is like / close to normal distribution. And has many outliers.
  4. All of the numeric columns have a strong positive correlation.
  5. In quantity FORD is the brand with the most fuel consumption and CO2 emission generating, while SRT is the smallest. Because FORD is a brand with the most frequency in brand distribution, while SRT is the smallest.
  6. In average, BUGATTI is the brand with the most fuel consumption and CO2 emission generating, while SMART is the smallest.
  7. In quantity SUV - SMALL is the vehicle class with the most fuel consumption and CO2 emissions generating, while VAN - CARGO is the smallest. Because SUV - SMALL is the vehicle type with the most frequency in vehicle type distribution, while VAN - CARGO is the smallest.
  8. In average VAN - PASSSENGER is the class with the most fuel consumption and CO2 emissions generating, while STATION WAGON - SMALL is the smallest.
  9. from regression analysis we get some interesting insight:
  1. The highest 5 variables that have a positive relationship with generated CO2 emissions are:
  • fuel consumption
  • fuel type D (Diesel)
  • fuel typr Z (Premium)
  • fuel type X (Regular)
  • car with FOCUS FFV model
  1. The Highest 5 variables that have a negative relationship with generated CO2 emissions are:
  • car with SAVANA 2500 PASSENGER FFV model
  • car with EXPRESS 3500 PASSENGER FFV model
  • car with XPRESS 2500 PASSENGER FFV model
  • fuel type N (Natural gas)
  • fuel type E (Ethanol)

*This fetaure importances doesn't inform us about causality.

  1. The most important thing to include in this section is:
  • In quantity SRT is a brand with the smallest fuel consumption and CO2 emissions generating.
  • In average SMART is a brand with the smallest fuel consumption and CO2 emissions generating.
  • In quantity VAN - CARGO is a vehicle type with the smallest fuel consumption and CO2 emissions generating.
  • In average WAGON - SMALL is a vehicle type with the smallest fuel consumption and CO2 emissions generating.

2. How The Data Looks Like?

# Import the pandas and numpy packages
import pandas as pd
import numpy as np

# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')

# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()

# Preview the dataframe
cars
print(f'The dataset has {cars.shape[0]} rows')
print(f'The dataset has {cars.shape[1]} columns')
cars.info()

Some information about the dataset: 

  1. The dataset has 7385 rows and 9 columns.
  2. The dataset has numeric (float and integer) and categorical (object) types.

3. Pre-analysis

a. transform columns names into lower case and replace white spaces with underscore "_"
cars.columns = cars.columns.str.lower().str.replace(' ', '_')
cars.columns
b. check types and measures of every variables
# 1. make
cars.make.sample(5)
# 2. model
cars.model.sample(5)
# 3. vehicle_class
cars.vehicle_class.sample(5)