Unveiling Trends in Renewable Energy ππ
The Data Analyst Wizard
π Background
The race to net-zero emissions is on! As the world battles climate change and rising energy demands, renewable energy is taking center stage. Solar, wind, and hydro are revolutionizing the way we power our lives. Some countries are leading the charge, while others are falling behind. But which nations are making the biggest impact? Whatβs driving their success? And what lessons can we learn to accelerate the green energy transition?
As a data analyst at NextEra Energy, one of the worldβs leading renewable energy providers, your mission is to dive deep into global trends and uncover the story behind the surge in clean energy. Using real-world data, you'll explore how economic, demographic, and environmental factors (like GDP, population, carbon emissions, and policy influence) shape energy production around the world.
So dig into the data, reveal the story behind the numbers, and create stunning visualizations that spark action! β‘π±
πΎ The data
Your team has gathered a global renewable energy dataset covering energy production, investments, policies, and economic factors shaping renewable adoption worldwide:
π Basic Identifiers
Countryβ Country nameYearβ Calendar year (YYYY)Energy Typeβ Type of renewable energy (e.g., Solar, Wind)
β‘ Energy Metrics
Production (GWh)β Renewable energy produced (Gigawatt-hours)Installed Capacity (MW)β Installed renewable capacity (Megawatts)Investments (USD)β Total investment in renewables (US Dollars)Energy Consumption (GWh)β Total national energy useEnergy Storage Capacity (MWh)β Capacity of energy storage systemsGrid Integration Capability (Index)β Scale of 0β1; ability to handle renewables in gridElectricity Prices (USD/kWh)β Average cost of electricityEnergy Subsidies (USD)β Government subsidies for energy sectorProportion of Energy from Renewables (%)β Share of renewables in total energy mix
π§ Innovation & Tech
R&D Expenditure (USD)β R&D spending on renewablesRenewable Energy Patentsβ Number of patents filedInnovation Index (Index)β Global innovation score (0β100)
π° Economy & Policy
GDP (USD)β Gross domestic productPopulationβ Total populationGovernment Policiesβ Number of policies supporting renewablesRenewable Energy Targetsβ Whether national targets are in place (1 = Yes, 0 = No)Public-Private Partnerships in Energyβ Number of active collaborationsEnergy Market Liberalization (Index)β Scale of 0β1
π§βπ€βπ§ Social & Governance
Ease of Doing Business (Score)β World Bank index (0β100)Regulatory Qualityβ Governance score (-2.5 to 2.5)Political Stabilityβ Governance score (-2.5 to 2.5)Control of Corruptionβ Governance score (-2.5 to 2.5)
πΏ Environment & Resources
CO2 Emissions (MtCO2)β Emissions in million metric tonsAverage Annual Temperature (Β°C)β Countryβs avg. tempSolar Irradiance (kWh/mΒ²/day)β Solar energy availabilityWind Speed (m/s)β Average wind speedHydro Potential (Index)β Relative hydropower capability (0β1)Biomass Availability (Tons/year)β Total available biomass
πͺ Challenge
As a data analyst at NextEra Energy, your mission is to explore a rich, multi-dimensional dataset and uncover powerful insights about global renewable energy trends.
Your submission should focus on:
-
π Exploratory Insights:
Alongside your model, include analysis addressing key trends:- Which regions are investing the most efficiently in renewables?
- How do economic, environmental, and policy factors relate to production levels?
-
π Visual Storytelling:
Create at least one compelling visualization to showcase a key insight about global renewable energy trends.
π Your analysis will uncover trends, spotlight key drivers, and guide smarter global energy decisions. Policymakers, investors, and sustainability leaders are hungry for clarity. Your insights could influence real-world energy strategies and help accelerate the path to a more sustainable future.
π§ββοΈ Judging Criteria
This is a community-based competition. Once the competition concludes, you'll have the opportunity to view and vote for the best submissions.
β
Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
- Check that all the cells run without error.
β³ Data is the new fuel - letβs generate insights and electrify the future!
-
Correlations: production levels metrics innovation and tech economic policy social and governance environment and resources
-
global trends years proportion of different renewables
Executive Summary
Overall average proportion of renewables is about 49% witth little difference between countries.
UK produces 10% more than the average (60%) and stands as the country with the highest percentage.Brazil and Japan trail the list with about 46%.
Correlations
There is little difference among the types of renewable percentages. Hydro is slighly higher than the rest.
Overall, there is practically no correlation between renewable energy proportion and predictor factors.
There is practically no correlation among the predictors variables either considering all countries together. It is necessary to deal with each country seperately.
Different countries different nuances
Depending on the country,some factor have positive and other have negative impact on renewables.
In either cases the correlations are very low. We are looking here only into minute nuances.
England stands out as the country with a large number of factors with some correlation with renewable energy sources. England also has the highest renewable energy proportion, probably because they have long term relevant planning.
Trends by country and energy types
There no overall improvement in renewable energy production for different countries. No general trend is discernable.
It seems different countries invest in different renewable energy types.
Canada puts more emphasis on Biomass. Brazil Germany,USA and imcreased Solar Energy output over the years. France,Australia,China invested mainly on Geothermal.
China and USA also increased their Hydro energy production in years.
UK has targeted multiple sources including Hydro, Geothermal and Solar energy and increased production of all three types.
Production from different energy types declined over the years for different countries.
Hydro declined for most countries. Solar and geothermsl energy seem to be on the rise.
UK example shows the wild irregularity of renewables over the years. All in all only solar energy picked up. The rest declined in recent years in UK.
Loading and surveying data
import pandas as pd
import os
import matplotlib.pyplot as plt
# Loading and surveying data
# List the files in the 'data' directory
print(os.listdir('data'))
# Read the CSV file from the 'data' directory
df = pd.read_csv('data/Training_set_augmented.csv')
print(df.head())
print('\n***info on columns of dataframe:\n')
print(df.info())
# Check categories
print('\n***Countries:')
print(df['Country'].unique())
print('\n***Years from to:')
print(df['Year'].min(), df['Year'].max())
print('\n***Renewable energy types')
print(df['Energy Type'].unique())
print('\n***Nominals:')
cols=['Government Policies','Renewable Energy Targets',\
'Public-Private Partnerships in Energy','Energy Market Liberalization']
for col in cols:
print(f' {col} {df[col].unique()}')
Renewable Energy by country and energy type
#Renewable energy by country and energy type
print('\n***Average proportion of energy from renewables:')
print(df['Proportion of Energy from Renewables'].mean())
print('\n***Proportion of renewable energy by country:')
df_country = df.groupby('Country')['Proportion of Energy from Renewables'].mean().sort_values(ascending=False)
print(df_country)
# Plotting the bar plot for proportion of renewable energy by country
plt.figure(figsize=(10, 6))
df_country.plot(kind='bar')
plt.title('Proportion of Renewable Energy by Country')
plt.xlabel('Country')
plt.ylabel('Proportion of Energy from Renewables')
plt.show()
print('\n***Proportion of energy from renewables by energy type:')
df_et=df.groupby('Energy Type')['Proportion of Energy from Renewables'].mean().sort_values(ascending=False)
# Renewable energy by type
print(df_et)
df_et.plot(kind='bar')
plt.title('Renewable energy by type')
plt.xlabel('Energy type')
plt.ylabel('Proportion of Energy from Renewables')
plt.show()
print('\n***Total production GWh by renewable energy type:')
print(df.groupby('Energy Type')['Production (GWh)'].mean().sort_values(ascending=False))
Summary:
Overall average proportion of renewables is about 49% witth little difference between countries.
UK produces 10% more than the average (60%) and stands as the country with the highest percentage.Brazil and Japan trail the list with about 46%.
There is little difference among the types of renewable percentages. Hydro is slighly higher than the rest.
Correlation of features in the data
import seaborn as sns
import matplotlib.pyplot as plt
# Correlation of variables
print('\n***Correlation of renewable energy proportion to all other facors:\n')
X = pd.get_dummies(df.drop(['Country', 'Energy Type', 'Production (GWh)', 'Proportion of Energy from Renewables'], axis=1))
y1 = df['Proportion of Energy from Renewables']
y2 = df['Production (GWh)']
# Calculate correlation of X with y1 and y2 separately
cormat_y1 = X.corrwith(y1)
cormat_y2 = X.corrwith(y2)
cormat=pd.DataFrame({'Proportion of renewables':cormat_y1,'Production (GWh)' :cormat_y2})
print(cormat)
print('\n***Heatmaps of cross correlation of variables:\n')
X = df[['Investments (USD)', 'Energy Consumption', 'Energy Storage Capacity', 'Grid Integration Capability', 'Electricity Prices', 'Energy Subsidies', 'Production (GWh)']]
# Calculate the correlation matrix
corr_matrix = X.corr()
# Plot the heatmap
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()
# Select another set of variables
Xp = df[['Solar Irradiance', 'Wind Speed', 'Hydro Potential', 'Biomass Availability','Production (GWh)']]
corr_matrix=Xp.corr()
sns.heatmap(corr_matrix, annot=True,cmap='coolwarm')
plt.show()
print('\n***Representative scatterplots of renewable energy vs. potential controllers:')
# Convert columns to a list and remove specified columns
columns=['Year','Investments (USD)','GDP']
for col in columns:
sns.lmplot(x=col, y='Proportion of Energy from Renewables', data=df)
plt.title(col)
plt.show()
cols = ['Government Policies', 'Energy Market Liberalization']
for col in cols:
sns.barplot(data=df, x=col, y='Proportion of Energy from Renewables')
plt.title(col)
plt.show()β
β