Analyzing Carbon Footprints in SQL
Greenhouse gas emissions attributable to products—from food to sneakers to appliances—make up more than 75% of global emissions.
Cite: The Carbon Catalogue
Our dataset, which is publicly availably on nature.com, stores product carbon footprints (PCFs) for various companies. PCFs are the greenhouse gas emissions attributable to a given product, measured in CO2e (carbon dioxide equivalent).
Our Database
Our database contains one table, which looks at PCFs by product as well as the stage of production these emissions occured in.
product_emissions
product_emissions| field | data_type |
|---|---|
| id | VARCHAR |
| year | INT |
| product_name | VARCHAR |
| company | VARCHAR |
| country | VARCHAR |
| industry_group | VARCHAR |
| weight_kg | NUMERIC |
| carbon_footprint_pcf | NUMERIC |
| upstream_percent_total_pcf | VARCHAR |
| operations_percent_total_pcf | VARCHAR |
| downstream_percent_total_pcf | VARCHAR |
1: Coca-Cola's emissions
First, let's look at a small subset of the data: emissions reported by Coca-Cola. Coke is actually made up of multiple companies around the globe, so we'll make sure our query returns data for any company name that starts with "Coca-Cola". Coke used to report for every single different product it has, so we will limit the results to six.
- Select all fields from the
product_emissionstable where the company name begins with "Coca-Cola", limiting to the first six results.
-- Select all fields where the company name is Coca-Cola, limiting to the first six results
SELECT *
FROM product_emissions
WHERE company LIKE 'Coca-Cola%'
LIMIT 6;A little background:
- Upstream emissions: emissions that occur before the company's own operations such as emissions created by manufacturing bottles that Coke buys from suppliers
- Operations emissions: emissions that the company creates directly, such as when Coke is bottling its product
- Downstream emissions: emissions that occur after the product leaves the company, such as after Coke has sold drinks to McDonald's
2: Most recent data
We'll focus on recent emissions data during this code-along. When was the most recent data collected?
- Return the most recent year for which data was collected.
-- Return the most recent year for which data was collected
SELECT MAX(year)
FROM product_emissions;3: Targeting major emitters
What are the industries with the most emissions in 2017 (the most recent year that data is available)?