Skip to content
0

Unveiling Trends in Renewable Energy πŸŒπŸ”‹

The Data Analyst Wizard

πŸ“– Background

The race to net-zero emissions is on! As the world battles climate change and rising energy demands, renewable energy is taking center stage. Solar, wind, and hydro are revolutionizing the way we power our lives. Some countries are leading the charge, while others are falling behind. But which nations are making the biggest impact? What’s driving their success? And what lessons can we learn to accelerate the green energy transition?

As a data analyst at NextEra Energy, one of the world’s leading renewable energy providers, your mission is to dive deep into global trends and uncover the story behind the surge in clean energy. Using real-world data, you'll explore how economic, demographic, and environmental factors (like GDP, population, carbon emissions, and policy influence) shape energy production around the world.

So dig into the data, reveal the story behind the numbers, and create stunning visualizations that spark action! ⚑🌱

πŸ’Ύ The data

Your team has gathered a global renewable energy dataset covering energy production, investments, policies, and economic factors shaping renewable adoption worldwide:

🌍 Basic Identifiers

  • Country – Country name
  • Year – Calendar year (YYYY)
  • Energy Type – Type of renewable energy (e.g., Solar, Wind)
⚑ Energy Metrics
  • Production (GWh) – Renewable energy produced (Gigawatt-hours)
  • Installed Capacity (MW) – Installed renewable capacity (Megawatts)
  • Investments (USD) – Total investment in renewables (US Dollars)
  • Energy Consumption (GWh) – Total national energy use
  • Energy Storage Capacity (MWh) – Capacity of energy storage systems
  • Grid Integration Capability (Index) – Scale of 0–1; ability to handle renewables in grid
  • Electricity Prices (USD/kWh) – Average cost of electricity
  • Energy Subsidies (USD) – Government subsidies for energy sector
  • Proportion of Energy from Renewables (%) – Share of renewables in total energy mix
🧠 Innovation & Tech
  • R&D Expenditure (USD) – R&D spending on renewables
  • Renewable Energy Patents – Number of patents filed
  • Innovation Index (Index) – Global innovation score (0–100)
πŸ’° Economy & Policy
  • GDP (USD) – Gross domestic product
  • Population – Total population
  • Government Policies – Number of policies supporting renewables
  • Renewable Energy Targets – Whether national targets are in place (1 = Yes, 0 = No)
  • Public-Private Partnerships in Energy – Number of active collaborations
  • Energy Market Liberalization (Index) – Scale of 0–1
πŸ§‘β€πŸ€β€πŸ§‘ Social & Governance
  • Ease of Doing Business (Score) – World Bank index (0–100)
  • Regulatory Quality – Governance score (-2.5 to 2.5)
  • Political Stability – Governance score (-2.5 to 2.5)
  • Control of Corruption – Governance score (-2.5 to 2.5)
🌿 Environment & Resources
  • CO2 Emissions (MtCO2) – Emissions in million metric tons
  • Average Annual Temperature (Β°C) – Country’s avg. temp
  • Solar Irradiance (kWh/mΒ²/day) – Solar energy availability
  • Wind Speed (m/s) – Average wind speed
  • Hydro Potential (Index) – Relative hydropower capability (0–1)
  • Biomass Availability (Tons/year) – Total available biomass

πŸ’ͺ Challenge

As a data analyst at NextEra Energy, your mission is to explore a rich, multi-dimensional dataset and uncover powerful insights about global renewable energy trends.

Your submission should focus on:

  1. 🌍 Exploratory Insights:
    Alongside your model, include analysis addressing key trends:

    • Which regions are investing the most efficiently in renewables?
    • How do economic, environmental, and policy factors relate to production levels?
  2. πŸ“Š Visual Storytelling:
    Create at least one compelling visualization to showcase a key insight about global renewable energy trends.

πŸ”Ž Your analysis will uncover trends, spotlight key drivers, and guide smarter global energy decisions. Policymakers, investors, and sustainability leaders are hungry for clarity. Your insights could influence real-world energy strategies and help accelerate the path to a more sustainable future.

πŸ§‘β€βš–οΈ Judging Criteria

This is a community-based competition. Once the competition concludes, you'll have the opportunity to view and vote for the best submissions.

βœ… Checklist before publishing

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
  • Check that all the cells run without error.

⏳ Data is the new fuel - let’s generate insights and electrify the future!

  1. Correlations: production levels metrics innovation and tech economic policy social and governance environment and resources

  2. global trends years proportion of different renewables

Executive Summary

Overall average proportion of renewables is about 49% witth little difference between countries.

UK produces 10% more than the average (60%) and stands as the country with the highest percentage.Brazil and Japan trail the list with about 46%.


Correlations

There is little difference among the types of renewable percentages. Hydro is slighly higher than the rest.

Overall, there is practically no correlation between renewable energy proportion and predictor factors.

There is practically no correlation among the predictors variables either considering all countries together. It is necessary to deal with each country seperately.


Different countries different nuances

Depending on the country,some factor have positive and other have negative impact on renewables.

In either cases the correlations are very low. We are looking here only into minute nuances.

England stands out as the country with a large number of factors with some correlation with renewable energy sources. England also has the highest renewable energy proportion, probably because they have long term relevant planning.


Trends by country and energy types

There no overall improvement in renewable energy production for different countries. No general trend is discernable.

It seems different countries invest in different renewable energy types.

Canada puts more emphasis on Biomass. Brazil Germany,USA and imcreased Solar Energy output over the years. France,Australia,China invested mainly on Geothermal.

China and USA also increased their Hydro energy production in years.

UK has targeted multiple sources including Hydro, Geothermal and Solar energy and increased production of all three types.

Production from different energy types declined over the years for different countries.

Hydro declined for most countries. Solar and geothermsl energy seem to be on the rise.

UK example shows the wild irregularity of renewables over the years. All in all only solar energy picked up. The rest declined in recent years in UK.

Loading and surveying data

import pandas as pd
import os
import matplotlib.pyplot as plt


# Loading and surveying data

# List the files in the 'data' directory
print(os.listdir('data'))

# Read the CSV file from the 'data' directory
df = pd.read_csv('data/Training_set_augmented.csv')
print(df.head())
print('\n***info on columns of dataframe:\n')
print(df.info())

# Check categories
print('\n***Countries:')
print(df['Country'].unique())

print('\n***Years from to:')
print(df['Year'].min(), df['Year'].max())

print('\n***Renewable energy types')
print(df['Energy Type'].unique())

print('\n***Nominals:')
cols=['Government Policies','Renewable Energy Targets',\
      'Public-Private Partnerships in Energy','Energy Market Liberalization']
for col in cols:
    print(f' {col} {df[col].unique()}')

Renewable Energy by country and energy type


#Renewable energy by country and energy type

print('\n***Average proportion of energy from renewables:')
print(df['Proportion of Energy from Renewables'].mean())

print('\n***Proportion of renewable energy by country:')
df_country = df.groupby('Country')['Proportion of Energy from Renewables'].mean().sort_values(ascending=False)
print(df_country)

# Plotting the bar plot for proportion of renewable energy by country
plt.figure(figsize=(10, 6))
df_country.plot(kind='bar')
plt.title('Proportion of Renewable Energy by Country')
plt.xlabel('Country')
plt.ylabel('Proportion of Energy from Renewables')
plt.show()

print('\n***Proportion of energy from renewables by energy type:')
df_et=df.groupby('Energy Type')['Proportion of Energy from Renewables'].mean().sort_values(ascending=False)

# Renewable energy by type
print(df_et)
df_et.plot(kind='bar')
plt.title('Renewable energy by type')
plt.xlabel('Energy type')
plt.ylabel('Proportion of Energy from Renewables')
plt.show()

print('\n***Total production GWh by renewable energy type:')
print(df.groupby('Energy Type')['Production (GWh)'].mean().sort_values(ascending=False))

Summary:

Overall average proportion of renewables is about 49% witth little difference between countries.

UK produces 10% more than the average (60%) and stands as the country with the highest percentage.Brazil and Japan trail the list with about 46%.

There is little difference among the types of renewable percentages. Hydro is slighly higher than the rest.

Correlation of features in the data


import seaborn as sns
import matplotlib.pyplot as plt

# Correlation of variables
print('\n***Correlation of renewable energy proportion to all other facors:\n')
X = pd.get_dummies(df.drop(['Country', 'Energy Type', 'Production (GWh)', 'Proportion of Energy from Renewables'], axis=1))
y1 = df['Proportion of Energy from Renewables']
y2 = df['Production (GWh)']

# Calculate correlation of X with y1 and y2 separately
cormat_y1 = X.corrwith(y1)
cormat_y2 = X.corrwith(y2)
cormat=pd.DataFrame({'Proportion of renewables':cormat_y1,'Production (GWh)' :cormat_y2})
print(cormat)

print('\n***Heatmaps of cross correlation of variables:\n')
X = df[['Investments (USD)', 'Energy Consumption', 'Energy Storage Capacity', 'Grid Integration Capability', 'Electricity Prices', 'Energy Subsidies', 'Production (GWh)']]

# Calculate the correlation matrix
corr_matrix = X.corr()

# Plot the heatmap
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()

# Select another set of variables
Xp = df[['Solar Irradiance', 'Wind Speed', 'Hydro Potential', 'Biomass Availability','Production (GWh)']]
corr_matrix=Xp.corr()
sns.heatmap(corr_matrix, annot=True,cmap='coolwarm')
plt.show()

print('\n***Representative scatterplots of renewable energy vs. potential controllers:')

# Convert columns to a list and remove specified columns
columns=['Year','Investments (USD)','GDP']

for col in columns:
    sns.lmplot(x=col, y='Proportion of Energy from Renewables', data=df)
    plt.title(col)
    plt.show()

cols = ['Government Policies', 'Energy Market Liberalization']
for col in cols:
    sns.barplot(data=df, x=col, y='Proportion of Energy from Renewables')
    plt.title(col)
    plt.show()
β€Œ
β€Œ
β€Œ