Skip to content
0

⚡ “Powering the Future, One Watt at a Time.

The Data Scientist Master

📖 Background

The race to net-zero emissions is heating up. As nations work to combat climate change and meet rising energy demands, renewable energy has emerged as a cornerstone of the clean transition. Solar, wind, and hydro are revolutionizing how we power our lives. Some countries are leading the charge, while others are falling behind. But which nations are making the biggest impact? What’s driving their success? And what lessons can we learn to accelerate green energy transition?

As a data scientist at NextEra Energy, one of the world’s leading renewable energy providers, your role is to move beyond exploration, into prediction. Using a rich, real-world dataset, you’ll build models to forecast renewable energy production, drawing on indicators like GDP, population, carbon emissions, and policy metrics.

With the world watching, your model could help shape smarter investments, forward-thinking policies, and a faster transition to clean energy. 🔮⚡🌱

1. Data Preparation

import pandas as pd

# Load the dataset
file_path = 'data/Training_set_augmented.csv'  # Adjust path if needed
df = pd.read_csv(file_path)

# Show basic info
print("Data loaded successfully!")
print("Shape:", df.shape)
print("\n Column Types:")
print(df.dtypes)

# Preview first few rows
print("\n First 5 Rows:")
print(df.head())

# Check for missing values
print("\n Missing Values:")
print(df.isna().sum().sort_values(ascending=False))
df.duplicated().sum()
df = df.drop_duplicates()
df['Year'] = df['Year'].astype(int)
df['Renewable Energy Targets'] = df['Renewable Energy Targets'].astype('category')
df['Country'] = df['Country'].astype('category')
df['Energy Type'] = df['Energy Type'].astype('category')
df.describe()

📝 Dataset Loading & Preprocessing Documentation

This document explains the data loading and preprocessing steps performed on the Training_set_augmented.csv dataset using Python and pandas.

  1. Importing pandas: The pandas library was imported to facilitate data manipulation and analysis.

  2. Loading the Dataset: The CSV file Training_set_augmented.csv was read into a DataFrame named df using pd.read_csv().

  3. Displaying Basic Info: A message confirmed that the data was loaded successfully. The shape (rows and columns) of the dataset was printed to understand its size.

  4. Checking Column Data Types: The dtypes attribute was used to inspect the data types of each column. This helps determine which columns need conversion for further analysis.

  5. Previewing First Rows: The first 5 records were displayed using df.head() to get a quick overview of the dataset's content and structure.

  6. Checking for Missing Values: df.isna().sum().sort_values(ascending=False) was used to check for missing (null) values in each column and sort them by count.

  7. Checking for Duplicates: df.duplicated().sum() was used to count the number of duplicate rows in the dataset.

  8. Removing Duplicates: df.drop_duplicates() removed duplicate entries to ensure data quality.

  9. Data Type Conversion:

    • The Year column was converted to an integer type for numerical operations.
    • The Renewable Energy Targets, Country, and Energy Type columns were converted to the category data type. This is useful for memory efficiency and improved performance on categorical operations such as filtering and grouping.
  10. Summary Statistics: df.describe() was used to generate basic statistical summaries for the numerical columns in the dataset, including count, mean, standard deviation, min, max, and percentiles.

This preprocessing ensures the dataset is clean, consistent, and ready for analysis or modeling.

2. Exploratory Data Analysis (EDA). All plots are interactive

Total Renewable Energy Production by Country

import plotly.express as px

# Aggregate total production by country
country_prod = df.groupby('Country', as_index=False)['Production (GWh)'].sum()

# Create the choropleth map
fig = px.choropleth(
    country_prod,
    locations='Country',
    locationmode='country names',
    color='Production (GWh)',
    hover_name='Country',
    color_continuous_scale='Viridis',
    title='Total Renewable Energy Production by Country (GWh)',
)

# Customize
fig.update_layout(
    geo=dict(showframe=False, showcoastlines=True),
    margin={"r":0,"t":50,"l":0,"b":0}
)

fig.show()

🌍 Renewable Energy Production Overview

France leads all countries in total renewable energy production with 13.48 million GWh, while the United Kingdom ranks lowest at 2.16 million GWh. Other countries fall between these two extremes.

📊 Interactive Graph

Use the toggle or dropdown controls on the graph to view additional countries not shown by default. This allows for flexible comparison across the full dataset.

🔎 Notes

  • All values are in millions of GWh
  • Differences reflect national energy policies, infrastructure, and resource availability

France’s high output underscores strong investment in renewables, while the UK’s lower figure indicates room for growth.

Installed Capacity per Capita

import plotly.express as px

# Aggregate total capacity and population by country
cap_pop = df.groupby('Country', as_index=False).agg({
    'Installed Capacity (MW)': 'sum',
    'Population': 'mean'
})

# Avoid division by zero
cap_pop = cap_pop[cap_pop['Population'] > 0]
cap_pop['MW per Capita'] = cap_pop['Installed Capacity (MW)'] / cap_pop['Population']

# Use Plotly to create a bubble map
fig = px.scatter_geo(
    cap_pop,
    locations='Country',
    locationmode='country names',
    size='MW per Capita',
    color='MW per Capita',
    hover_name='Country',
    size_max=40,
    projection='natural earth',
    color_continuous_scale='Plasma',
    title='Installed Renewable Capacity per Capita (MW/person)',
)

# Layout tweaks
fig.update_layout(
    geo=dict(showland=True, landcolor='lightgray'),
    margin=dict(l=0, r=0, t=50, b=0)
)

fig.show()

🌍 Renewable Energy Production per Capita

Based on the latest data:

  • France has the highest renewable energy production per capita at 0.00943 GWh.
  • United Kingdom ranks the lowest, with just 0.0016 GWh per capita.
  • All other countries fall between these two values.

📊 Interactive Graph

Use the toggle or dropdown controls in the graph to explore per capita production across all countries. Only a subset may be visible by default.

🔎 Notes

  • Figures are in gigawatt-hours (GWh) per capita
  • Values reflect differences in:
    • Renewable energy infrastructure
    • National energy policies
    • Population size relative to energy output

This metric highlights not just total production but how renewable energy access compares at the individual level. France’s leadership indicates both high production and favorable distribution, while the UK’s position suggests potential for growth in capacity or efficiency.

Top Countries by Total Renewable Energy Production