Climate Change and Impacts in Africa
According to the United Nations, Climate change refers to long-term shifts in temperatures and weather patterns. Such shifts can be natural, due to changes in the sun’s activity or large volcanic eruptions. But since the 1800s, human activities have been the main driver of climate change, primarily due to the burning of fossil fuels like coal, oil, and gas.
The consequences of climate change now include, among others, intense droughts, water scarcity, severe fires, rising sea levels, flooding, melting polar ice, catastrophic storms, and declining biodiversity.
You work for a Non-governmental organization tasked with reporting the state of climate change in Africa at the upcoming African Union Summit. The head of analytics has provided you with IEA-EDGAR CO2 dataset which you will clean, combine and analyze to create a report on the state of climate change in Africa. You will also provide insights on the impact of climate change on African regions (with four countries, one from each African region, as case studies).
Dataset
The dataset, IEA-EDGAR CO2, is a component of the EDGAR (Emissions Database for Global Atmospheric Research) Community GHG database version 7.0 (2022) including or based on data from IEA (2021) Greenhouse Gas Emissions from Energy, www.iea.org/statistics, as modified by the Joint Research Centre. The data source was the EDGARv7.0_GHG website provided by Crippa et. al. (2022) and with DOI.
The dataset contains three sheets - IPCC 2006
, 1PCC 1996
, and TOTALS BY COUNTRY
on the amount of CO2 (a greenhouse gas) generated by countries between 1970 and 2021. You can download the dataset from your workspace or inspect the dataset directly here.
TOTALS BY COUNTRY SHEET
This sheet contains the annual CO2 (kt) produced between 1970 - 2021 in each country. The relevant columns in this sheet are:
Columns | Description |
---|---|
C_group_IM24_sh | The region of the world |
Country_code_A3 | The country code |
Name | The name of the country |
Y_1970 - Y_2021 | The amount of CO2 (kt) from 1970 - 2021 |
IPCC 2006
These sheets contain the amount of CO2 by country and the industry responsible.
Columns | Description |
---|---|
C_group_IM24_sh | The region of the world |
Country_code_A3 | The country code |
Name | The name of the country |
Y_1970 - Y_2021 | The amount of CO2 (kt) from 1970 - 2021 |
ipcc_code_2006_for_standard_report_name | The industry responsible for generating CO2 |
Instructions
The head of analytics in your organization has specifically asked you to do the following:
- Clean and tidy the datasets.
- Create a line plot to show the trend of
CO2
levels across the African regions. - Determine the relationship between time (
Year
) andCO2
levels across the African regions. - Determine if there is a significant difference in the
CO2
levels among the African Regions. - Determine the most common (top 5) industries in each African region.
- Determine the industry responsible for the most amount of CO2 (on average) in each African Region.
- Predict the
CO2
levels (at each African region) in the year 2025 using linear regression. - Determine if
CO2
levels affect annualtemperature
in the selected African countries using linear regression.
# Setup
library(dplyr)
library(readxl)
library(readr)
library(tidyr)
library(ggplot2)
library(assertthat)
library(broom)
# we need only the African regions
african_regions <- c('Eastern_Africa', 'Western_Africa', 'Southern_Africa', 'Northern_Africa')
ipcc_2006_africa <- read_xlsx("IEA_EDGAR_CO2_1970-2021.xlsx", sheet = 'IPCC 2006', skip = 10) %>%
filter(C_group_IM24_sh %in% african_regions)
totals_by_country_africa <- read_xlsx("IEA_EDGAR_CO2_1970-2021.xlsx", sheet = 'TOTALS BY COUNTRY', skip = 10) %>%
filter(C_group_IM24_sh %in% african_regions)
# Read the temperatures datasets containing four African countries
# One from each African Region:
# Nigeria: West Africa
# Ethiopa : East Africa
# Tunisia: North Africa
# Mozambique: South Africa
temperatures <- read_csv('temperatures.csv')
Instruction 1: Clean and tidy the datasets
Tasks
- Rename
C_group_IM24_sh
toRegion
,Country_code_A3
toCode
, andipcc_code_2006_for_standard_report_name
toIndustry
in the corresponding African datasets. - Drop
IPCC_annex
,ipcc_code_2006_for_standard_report
, andSubstance
from the corresponding datasets. - Gather
Y_1970
toY_2021
into a two columnsYear
andCO2
. Drop rows whereCO2
is missing. - Convert
Year
toint
type.
# Your code here
ipcc_2006_africa <- ipcc_2006_africa %>%
rename('Region' = 'C_group_IM24_sh',
'Code' = 'Country_code_A3',
'Industry' = 'ipcc_code_2006_for_standard_report_name')
totals_by_country_africa <- totals_by_country_africa %>%
rename('Region' = 'C_group_IM24_sh',
'Code' = 'Country_code_A3')
ipcc_2006_africa = select(ipcc_2006_africa, -IPCC_annex, -ipcc_code_2006_for_standard_report, -Substance)
totals_by_country_africa = select(totals_by_country_africa, -IPCC_annex, -Substance)
ipcc_2006_africa <- ipcc_2006_africa %>%
pivot_longer(
cols = starts_with("Y_"),
names_to = "Year",
names_prefix = "Y_",
values_to = "CO2",
values_drop_na = TRUE)
totals_by_country_africa <- totals_by_country_africa %>%
pivot_longer(
cols = starts_with("Y_"),
names_to = "Year",
names_prefix = "Y_",
values_to = "CO2",
values_drop_na = TRUE)
ipcc_2006_africa$Year <- as.integer(ipcc_2006_africa$Year)
totals_by_country_africa$Year <- as.integer(totals_by_country_africa$Year)
colnames(ipcc_2006_africa)
colnames(totals_by_country_africa)
head(ipcc_2006_africa)
head(totals_by_country_africa)
Instruction 2: Show the trend of CO2 levels across the African regions
Tasks
- Group the
totals_by_country_africa
dataset byRegion
andYear
and summarise theCO2
column using themean()
function. - Save the summarised column as
co2_level
and the resulting data frame asco2_level_by_region_per_year
. - Create a line plot using
ggplot()
to show the trend ofCO2
levels byYear
across the African Regions. For testing purposes, save the line plot as thetrend_of_CO2_emission_plot
variable.
# Your code here
co2_level_by_region_per_year <- totals_by_country_africa %>%
group_by(Region, Year) %>%
summarize(co2_level = mean(CO2))
trend_of_CO2_emission_plot <- ggplot(co2_level_by_region_per_year, aes(Year, co2_level, color = Region)) +
geom_line() +
ggtitle("CO2 levels across the African Regions between 1970 and 2021")
Instruction 3: Determine the relationship between time and CO2 levels in each African region
Tasks
- Using the
totals_by_country_africa
dataset, conduct a Spearman's correlation to determine the relationship between time (Year
) andCO2
within each AfricanRegion
. - Save the results in a variable called
relationship_btw_time_CO2
.
# Your code here
relationship_btw_time_CO2 <- totals_by_country_africa %>%
group_by(Region) %>%
summarise(r = cor(Year, CO2, method = 'spearman'))
relationship_btw_time_CO2
Instruction 4: Determine if there is a significant difference in the CO2
levels among the African Regions
CO2
levels among the African RegionsTasks
- Using
totals_by_country_africa
, conduct an ANOVA usingaov()
function on theCO2
byRegion
. Save the results asaov_results
. - Conduct a posthoc test (with Bonferroni correction) to find the source of the significant difference. Save the results as
pw_ttest_result
. - Is it true that the
CO2
levels of theSouthern_Africa
andNorthern_Africa
regions do not differ significantly? The previous task should provide you with the answer.
# Your code here
aov_results <- aov(CO2 ~ Region, data = totals_by_country_africa)
pw_ttest_result <- pairwise.t.test(
totals_by_country_africa$CO2,
totals_by_country_africa$Region,
p.adjust.method = "bonferroni"
)
Instruction 5: Determine the most common (top 5) industries in each African region.
Tasks
- Group
ipcc_2006_africa
byRegion
and then countIndustry
to derive the counts of the industries in each region. - Select the top 5 industries using
slice_max()
and save the result astop_5_industries
. Do not set thewith_ties
parameter ofslice_max()
toFALSE
.
# Your code here
top_5_industries <- ipcc_2006_africa %>%
group_by(Region)%>%
count(Industry)%>%
slice_max(n=5, order_by = n)
Instruction 6: Determine the industry responsible for the most amount of CO2 (on average) in each African Region
Tasks
-
Group
ipcc_2006_africa
byRegion
andIndustry
and summariseCO2
usingmean()
to get the averageCO2
by industry in each region. -
Next, select the top industry from each region using
slice_max()
. Save the results astop_industry_by_co2_emission
. The results should contain four (4) rows and three (3) columns.
# Your code here
top_industry_by_co2_emission <- ipcc_2006_africa %>%
group_by(Region, Industry)%>%
summarize(co2_level = mean(CO2))%>%
slice_max(n = 1, order_by = co2_level)