Skip to content
Analysis with Python and SQL: CO2 emission evaluation and bicycle market analysis (Bankole Moses)
CO2 EMISSION EVALUATION AND BICYCLE MARKET ANALYSIS
# Import the pandas and numpy packages
import pandas as pd
import numpy as np
# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')
# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
# Preview the dataframe
cars
first, lets see if our data is cleaned and do some exploratory data analysis
#changing our dataframe from cars to df
df = cars
df.head()
df.nunique()
from our summary, we have:
- 41 types of make
- 2053 types of model
- 16 types of vehicle size
- 51 types of engine size(L)
- 27 types of transmission
- 5 types of fuel.... in our DataFrame
#checking for missing data in our dataset
missing_values = df.isnull()
for column in missing_values.columns.values.tolist():
print(column)
print(missing_values[column].value_counts())
print("")
A true indicates a missing and false indicates otherwise.
From our data, we have no missing values, so it seems theres less work to do on cleaning our dataset.
df.columns
df["Fuel Type"].value_counts()
From our summary, we have:
- 3637 cars with fuel type X
- 3202 cars with fuel type Z
- 370 cars with fuel type E
- 175 cars with fuel type D
- 1 cars with fuel type N
#checking data dypes
df.dtypes
df.describe()
To find the median engine-size
engine_size = df[["Engine Size(L)"]]