Skip to content

We will study the polluting emissions of vehicles sold in France, using a dataset provided by the French government. The .csv file is available on this path : https://www.data.gouv.fr/fr/datasets/r/57ab4020-1344-4ce6-9053-7ff2977d759e

The dataset provides informations of cars sold in France. There we find information concerning each vehicle allowing its identification : such as the mark, name of the model, the group to which the mark belongs, commercial description, fuel, body type, engine capacity, range, tax power, engine power, weight/power ratio, gearbox type, number of gear ratios. Then we find informations regarding the minimum and maximum consumption at low, medium, high, very high and medium speeds. And at the end, we find informations about emission at different speeds and emissions during a test drive (CO2, hydrocarbons, Nox, particles). The bonus-malus applied to the vehicle and the price of the vehicle.

We will analyse the Global Warming Power (GWP) of each category of vehicule by grouping them by engine used ('Energy') and others polluants.

In this dataset we only have the pollutants emitted by the engine. You should know that in these emissions we will have gases which will contribute to global warming, and others which will have an impact on the health of living beings. The gases that have an effect on global warming are:

  • water vapor
  • CO2
  • Hydrocarbons (HC)
  • NO2
  • Halogens
  • O3

https://jancovici.com/changement-climatique/gaz-a-effet-de-serre-et-cycle-du-carbone/quels-sont-les-gaz-a-effet-de-serre-quels-sont-leurs-contribution-a-leffet-de-serre/

https://www.geo.fr/environnement/hydrocarbure-definition-classification-et-utilisation-193625

Among the pollutants impacting health, we will find:

  • the Nox
  • fine particles
  • Volatile Organic Compounds

You should also know that pollutants can pass from one group to another by associating with other components present naturally or not. https://www.ecologie.gouv.fr/pollution-lair-origines-situation-et-impacts

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pollution = pd.read_csv(r"ADEME-CarLabelling.csv", sep=";", decimal=",") 
pollution.head()
pollution.shape
pollution.info()
pollution.columns
pollution.rename(columns=lambda x: x.replace(' ', '_'), inplace=True)
pollution.rename(columns=lambda x: x.replace('-', '_'), inplace=True)

pollution.replace(" ", "_", inplace = True, regex= True)
pollution.replace("-", "_", inplace = True, regex= True)
pollution.head()

We delete all the columns unecessary for our analyzes as well as accents.

pollution.drop(columns = ['Gamme', 'Description_Commerciale', 'Groupe', 'Puissance_maximale', 'Puissance_nominale_électrique', 'Rapport_poids_puissance', 'Type_de_boite', 'Nombre_rapports', 'Conso_basse_vitesse_Min', 'Conso_basse_vitesse_Max', 'Conso_moyenne_vitesse_Min', 'Conso_moyenne_vitesse_Max', 'Conso_haute_vitesse_Min', 'Conso_haute_vitesse_Max', 'Conso_T_haute_vitesse_Min', 'Conso_T_haute_vitesse_Max', 'Conso_elec_Min', 'Conso_elec_Max', 'Autonomie_elec_Min', 'Autonomie_elec_Max', 'Autonomie_elec_urbain_Min', 'Autonomie_elec_urbain_Max', 'CO2_basse_vitesse_Min', 'CO2_basse_vitesse_Max', 'CO2_moyenne_vitesse_Min', 'CO2_moyenne_vitesse_Max', 'CO2_haute_vitesse_Min', 'CO2_haute_vitesse_Max', 'CO2_T_haute_vitesse_Min', 'CO2_T_haute_vitesse_Max', 'Bonus_Malus', 'Barème_Bonus_Malus', 'Masse_OM_Min', 'Masse_OM_Max'], inplace = True)
!pip install unidecode

from unidecode import unidecode

pollution.columns = [unidecode(col) for col in pollution.columns]
pollution.head()
pollution.head(50)
pollution = pollution.loc[(~pollution['Conso_vitesse_mixte_Min'].isna()) | (pollution['Energie'] == 'ELECTRIC')]
pollution.isna().sum()
pollution[pollution['Energie'] == 'ELECTRIC'].count()

As emissions from electric vehicles have not been measured, they are useless for the analyzes here. We will only keep vehicles with a thermal engine.