Analyzing Carbon Footprints in SQL
Greenhouse gas emissions attributable to products—from food to sneakers to appliances—make up more than 75% of global emissions.
Cite: The Carbon Catalogue
Our dataset, which is publicly availably on nature.com, stores product carbon footprints (PCFs) for various companies. PCFs are the greenhouse gas emissions attributable to a given product, measured in CO2e (carbon dioxide equivalent).
To consult the solution, head over to the file browser and select notebook-solution.ipynb.
Connecting data
Set up connection
- In the left side bar, go to Databases > Connect a database > PostgreSQL
- Use the following credentials:
- Integration name: SQL Carbon Emissions
- Port: 5432
- Hostname: workspacedemodb.datacamp.com
- Database: emissions
- Username: emissions_codealong
- Password: emissions_codealong
Database contents
Our database contains one table, product_emissions, which looks at PCFs by product as well as the stage of production these emissions occured in.
| field | data_type |
|---|---|
| id | VARCHAR |
| year | INT |
| product_name | VARCHAR |
| company | VARCHAR |
| country | VARCHAR |
| industry_group | VARCHAR |
| weight_kg | NUMERIC |
| carbon_footprint_pcf | NUMERIC |
| upstream_percent_total_pcf | VARCHAR |
| operations_percent_total_pcf | VARCHAR |
| downstream_percent_total_pcf | VARCHAR |
1: Coca-Cola's emissions
First, let's look at a small subset of the data: emissions reported by Coca-Cola. Coke is actually made up of multiple companies around the globe, so we'll make sure our query returns data for any company name that starts with "Coca-Cola". Coke used to report for every single different product it has, so we will limit the results to six.
- Select all fields from the
product_emissionstable where the company name begins with "Coca-Cola", limiting to the first six results.
-- Select all fields where the company name is Coca-Cola, limiting to the first six results
SELECT *
FROM product_emissions
WHERE company LIKE 'Coca-Cola%'
LIMIT 6;A little background:
- Upstream emissions: emissions that occur before the company's own operations such as emissions created by manufacturing bottles that Coke buys from suppliers
- Operations emissions: emissions that the company creates directly, such as when Coke is bottling its product
- Downstream emissions: emissions that occur after the product leaves the company, such as after Coke has sold drinks to McDonald's
2: Most recent data
We'll focus on recent emissions data during this code-along. When was the most recent data collected?
- Return the most recent year for which data was collected.
-- Return the most recent year for which data was collected
SELECT MAX(year)
FROM product_emissions;3: Targeting major emitters
What are the industries with the most emissions in 2017 (the most recent year that data is available)?
- Return the
industry_groupand a rounded total ofcarbon_footprint_pcffor each industry, aliasing astotal_industry_footprint. - Limit to data for 2017 and order by
total_industry_footprint.