## Python & SQL Project

### 1️⃣ Python 🐍 - CO2 Emissions

Now let's now move on to the competition and challenge.

### 📖 Background

You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.

After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.

### 💾 The data I

#### You have access to seven years of CO2 emissions data for Canadian vehicles (source):

- "Make" - The company that manufactures the vehicle.
- "Model" - The vehicle's model.
- "Vehicle Class" - Vehicle class by utility, capacity, and weight.
- "Engine Size(L)" - The engine's displacement in liters.
- "Cylinders" - The number of cylinders.
- "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
- "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
- "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
- "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.

The data comes from the Government of Canada's open data website.

```
# Import the pandas and numpy packages - Code in this section was provided by DataCamp project.
# Please see Challenge 1 and 2 for my code.
import pandas as pd
import numpy as np
import seaborn as sns
# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')
# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
# Preview the dataframe
cars
```

```
# Look at the first ten items in the CO2 emissions array
cars_co2_emissions[:10]
```

### 💪 Challenge I

Help your colleague gain insights on the type of vehicles that have lower CO2 emissions. Include:

- What is the median engine size in liters?
- What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
- What is the correlation between fuel consumption and CO2 emissions?
- Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
- What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
- Any other insights you found during your analysis?

```
# What is the median engine size in liters?
print("The median engine size in liters is")
np.median(cars_engine_sizes)
```

```
# What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
print("The average fuel consumption by fuel type is")
cars.groupby('Fuel Type')['Fuel Consumption Comb (L/100 km)'].mean()
```

```
# What is the correlation between fuel consumption and CO2 emissions?
print("The correlation between fuel consumption and CO2 emissions is")
cars['Fuel Consumption Comb (L/100 km)'].corr(cars['CO2 Emissions(g/km)'])
```

`sns.scatterplot(x="Fuel Consumption Comb (L/100 km)", y="CO2 Emissions(g/km)", data = cars)`

```
# Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
print("The mean CO2 Emissions for Small SUVs is: ")
np.mean(cars[cars['Vehicle Class']=="SUV - SMALL"]['CO2 Emissions(g/km)'])
```

```
print("The mean and median CO2 Emissions for Mid-Size Vehicles is: ")
cars[cars['Vehicle Class']=="MID-SIZE"]['CO2 Emissions(g/km)'].agg([np.mean, np.median])
```

```
# What are the average CO2 emissions for all vehicles?
print("The average CO2 emissions for all vehicles is: ")
np.mean(cars_co2_emissions)
```

```
# For vehicles with an engine size of 2.0 liters or smaller?
cars[cars['Engine Size(L)'] <= 2.0]['CO2 Emissions(g/km)'].agg([np.mean, np.median])
```

**Conclusion:**

The correlation between fuel consumption and CO2 emissions is 0.918 which is relatively close to 1 and indicates a fairly strong, positive correlation. Ethanol fuel has the highest fuel consumption while disel has the lowest. Mid-sized vehichles tend to have lower CO2 emissions (222.46) than small SUV (236.29) though both of these vehichle types are below the average for all cars (250.58). Cars with an engine size of 2.0 liters or smaller had a mean CO2 Emissions of 198.27 which is far below the mean. In conclusion, cars with smaller engines tend to produce the least amount of CO2 Emissions.