Skip to content

What's Up with Victorian Electricity Prices?

Analysing historical data to inform future contract negotiations and storage venture profitability

Introduction

Electricity price forecasting (EPF) has become an increasing area of interest and is commonly divided into short- to medium-term forecasting (hour-ahead or day-ahead forecasting) and medium- to long-term forecasting (month/s-ahead to year/s-ahead forecasting). Of the two, the latter is arguably the more difficult as it typically involves having to model and/or anticipate long-term variations in the relationship between electricity prices and relevant predictors. As a result, compared to the data-driven approach commonly used for short-term prediction, long-term EPFs have typically been obtained through the use of econometric, market-based models.

In spite of this, the primary objectives of the current analysis were to:

  1. Investigate and provide a summary of how electricity prices changed over the course of a year
  2. Develop a data-driven model that could provide a reasonable projection of daily Victorian electricity prices up to eighteen mmonths ahead and be used as a basis for future contract negotiations, and
  3. Investigate and provide guidance on how much revenue could be generated by the proposed energy storage venture.

The author found that:

  1. Both the retail price and average positive price time series were characterised by extreme price spikes typically occurring in the month of January.
  2. Once these were removed, the annual patterns in the retail price data were more apparent. That is:
    • On average, prices were lowest in summer, highest in winter and middling in Autumn and spring.
    • On a calendar year basis, retail prices typically rose throughout the first half of the year to peak in June before consistently declining again until December.
    • Retail prices were typically lowest in January and February.
    • Whilst remaining consistent with the previously-observed trends, prices also varied throughout each month when aggregated to a weekly period.
    • Prices were noticably lower over the weekend compared to weekdays.
  3. There were no strong relationships, linear or otherwise, between any of the current predictor variables and retail prices. A Spearman's correlation matrix indicated that the strongest linear relationships were with demand and historical lags of price. A series of scatter plots indicated that there was:
    • A positive relationship with demand
    • A u-shaped relationship with both minimum and maximum temperature
    • An upside-down u-shaped relationship with year (although this likely was due to external policy settings)
    • Very slight (and possibly statistically insignificant) upside-down u-shaped relationships with month, day of the year, and week of the year
    • Positive and approximately linear relationships with historical lags of itself (retail prices)
    • Relationships consistent with what had previously been observed in finding 2.
  4. Of the seven models developed, an ARIMA(5,1,2)(0,0,4) performed the best on the last eighteen months of retail price data set aside for model assessment. In addition, this was the only model that performed better than the naive benchmark model, which simply predicted future values as the last historical value with a seven day lag.
  5. The use of a battery storage system could generate revenue of up to 302,000 AUD per annum. However, the derivation of this figure required some fairly strong assumptions that do not accurately reflect the electricity market and/or the properties of a battery (such as perfect day-ahead forecasting and no capacity/efficiency degredation over time.

It is recommended that:

  1. The ARIMA model should be chosen as the starting point for future planning, but forecasts taken with a grain of salt at present given disappointing performance in relation to predicting unexpected changes in price trends.
  2. Additional predictor variables sought and considered for inclusion as exogenous regressors that might be better correlated with electricity prices; for example network supply and generation costs or leading indicators of retail prices.
  3. The chosen model developed to the point that forecasts are sufficiently robust and historical lags of the retail price can be considered for inclusion.
  4. A more extensive feature selection process conducted to determine if there is a specific combination of features that can more accurately predict Victorian electricity prices.
  5. A more robust methodology developed to estimate revenue potential from the use of a 70 MWh battery storage system.

Data and Methods

The authors were provided with over five years of daily energy prices (2,106 observations) sourced from (Kaggle) along with a number of potential predictor variables, such as the date of the observation, temperature, rainfall, and whether or not it was a school day or holiday. See Table 1 below for details:

VariableDescription
datefrom January 1, 2015, to October 6, 2020
demanddaily electricity demand in MWh
pricerecommended retail price in AUD/MWh
demand_pos_pricetotal daily demand at a positive price in MWh
price_positiveaverage positive price, weighted by the corresponding intraday demand in AUD/MWh
demand_neg_pricetotal daily demand at a negative price in MWh
price_negativeaverage negative price, weighted by the corresponding intraday demand in AUD/MWh
frac_neg_pricethe fraction of the day when the demand traded at a negative price
min_temperatureminimum temperature during the day in Celsius
max_temperaturemaximum temperature during the day in Celsius
solar_exposuretotal daily sunlight energy in MJ/m^2
rainfalldaily rainfall in mm
school_day"Y" if that day was a school day, "N" otherwise
holiday"Y" if the day was a state or national holiday, "N" otherwise

The analysis was divided into three parts. The first part involved a fairly extensive exploratory data analysis in order to observe any major trends or patterns in the provided data and to determine the major drivers or relationships between energy prices and the feature variables. This included answering the first part of the brief, which was to identify how electricity prices changed over the course of the year.

The second part used the results of the first to inform the development of several potential forecasting models, including a benchmark naive model, an Autoregressive Integrated Moving Average (ARIMA) model with and without exogenous regressors, a time series linear model, a Prophet model, and a feed-forward neural network. The performance of all models was assessed using a holdout set of the last 548 days of data, but where possible, performance was also assessed using time series cross-validation.

The final part was devoted solely to answering the final main objective, which was to determine the annual revenue potential from the use of a 70 MWh storage system to capture energy when pricing conditions are unfavourable and sell it by the next day if prices are higher. In practically, the approach taken by the author for this section was to simply determine the price differential between the retail price of any single day and the immediately following day. If the price differential was positive; that is, the retail pricewas higher on the following day, then the battery was charged to capacity and the revenue calculated by multiplying the price differential by the maximum capacity of the battery. If the price differential was 0 or negative, then revenue was set to zero. There were a large number of assumptions underlying this simple methodology, which have been detailed in Part C of the Results and Analysis section of the report.

The analysis was conducted entirely in R, using primarily the tidyverse and tidyverts package families.


Results and Analysis

Part A - Exploratory Data Analysis
# suppress warnings
options(warn=-1)
install.packages(c("corrplot", "cowplot", "fable.prophet", "feasts", "fable", "fabletools", "naniar", "tsibble", "urca"))
Hidden output
# general libraries
library(tidyverse)
library(RColorBrewer)
library(scales)
library(cowplot)
library(corrplot)
library(naniar)

# libraries for time series analysis and forecasting
library(tsibble)
library(lubridate)
library(zoo)
library(feasts)
library(fable)
library(fable.prophet)
Hidden output
# function to add an overall title to any grid of ggplots
addJointTitle <- function(title_text) {
    title <- ggdraw() +
    draw_label(title_text,
               size = 18,
               x = 0,
               hjust = 0) +
    theme(
      plot.margin = margin(t = 0, r = 0, b = 0, l = 7)
    )
}

# function to define a reversed 'not in' operator
'%!in%' <- function(x, y)!('%in%'(x, y))

# function to calculate the significance of the correlations
# mat : is a matrix of data
# ... : additional arguments to pass to the native R cor.test function
corMtest <- function(mat, ...) {
    mat <- as.matrix(mat)
    n <- ncol(mat)
    p_mat<- matrix(NA, n, n)
    diag(p_mat) <- 0
    for (i in 1:(n - 1)) {
        for (j in (i + 1):n) {
            tmp <- cor.test(mat[, i], mat[, j], ...)
            p_mat[i, j] <- p_mat[j, i] <- tmp$p.value
        }
    }
    colnames(p_mat) <- rownames(p_mat) <- colnames(mat)
    p_mat
}

# gg_tsresiduals - time series plot, ACF and histogram of model residuals
plotggtsresiduals <- function(model) {
    options(repr.plot.width=20, repr.plot.height=10)
    p1 <- model %>% augment() %>% autoplot(.vars = .innov) + geom_point() + theme_report()
    p2 <- model %>% augment() %>% ACF(.innov) %>% autoplot() + theme_report()
    p3 <- model %>% augment() %>% ggplot(aes(.innov)) + geom_histogram(binwidth = 30) + geom_rug() + theme_report()

    plot_grid(p1, p2, p3)
}

plotggtsresiduals2 <- function(model, plot_title) {
    #options(repr.plot.width=20, repr.plot.height=8)
    title = addJointTitle(plot_title)
    p1 <- model %>% augment() %>% autoplot(.vars = .innov) + geom_point() + theme_report()
    p2 <- model %>% augment() %>% ACF(.innov) %>% autoplot() + theme_report()

    plot_plots <- plot_grid(p1, p2, ncol = 2)
    plot_grid(title, plot_plots, ncol = 1, rel_heights = c(0.1, 1.5))
}

# gg_tsdisplay - time series plot, ACF, PACF, histogram, 
plotggtsdisplay <- function(data, var, diff_var) {
    options(repr.plot.width=20, repr.plot.height=10)
    p1 <- data %>% autoplot(.vars = ) + geom_point() + theme_report()
    p2 <- data %>% ggplot(aes()) + geom_histogram(binwidth = 30) + geom_rug() + theme_report()
    p3 <- data %>% ACF() %>% autoplot() + theme_report()
    p4 <- data %>% PACF() %>% autoplot() + theme_report()

    plot_grid(p1, p2, p3, p4)
}

# customised ggplot2 themes
theme_report <- function() {
    theme_classic() +
    theme(
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14),
        plot.title = element_text(size = 16),
        legend.text = element_text(size = 12),
        legend.title = element_text(size = 14)
    )
}

theme_report_angled <- function() {
    theme_classic() +
    theme(
        axis.text.x = element_text(angle = 30, hjust = 1),
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14),
        plot.title = element_text(size = 16),
        legend.text = element_text(size = 12),
        legend.title = element_text(size = 14)
    )
}
df_orig <- readr::read_csv('./data/energy_demand.csv')

df_orig <- df_orig %>% select(date, price, everything())

# convert to tsibble
df_orig <- as_tsibble(df_orig, index = date)
str(df_orig)
summary(df_orig)
Hidden output

From Figures 1 and 2, it is immediately obvious that the average positive price very much mirrored the overall recommended retail price and, perhaps more importantly, that the series was characterised by several large price spikes during the month of January. For the purposes of robust model development, outlying price spikes such as these were identified, removed and then interpolated using a basic ARIMA model. The identification process consisted of decomposing each price time series into its respective trend, seasonal and remainder components with a Seasonal and Trend using Loess (STL) method and then defining an outlier as any observation in which the remainder component was outside the 25th or 75th quantile plus or minus 3 times the interquartile range (IQR).^[1]

As had been hoped, this made very little difference to the overall distributions of the recommended retail price and the average positive price time series whilst showing the actual trends and patterns in the series (See Table 1 and Figure 4). The methodology failed to produce satisfactory results when applied to the average negative price time series, and therefore this variable was left in its unadjusted/original form. Finally, note that from this point forward, unless otherwise specified, the term 'retail price' means the recommended retail price with outliers removed, not the original series.

[1]: That is, less than the 25th quantile - 3 * IQR or more than the 75th quantile + 3 * IQR.

date_breaks <- seq(min(df_orig$date), max(df_orig$date), by = "6 month")

i <- 1
letters <- c('a', 'b', 'c')
titles <- c("Recommended retail price, January 2015 to October 2020", 
            "Average positive price, January 2015 to October 2020", 
            "Average negative price, January 2015 to October 2020")

for (col in c("price", "price_positive", "price_negative")) {
    ggpl <- ggplot(df_orig, aes(x = date, y = .data[[col]])) +
    geom_line(colour = "steelblue", size = 0.3) +
    scale_x_date(breaks = date_breaks, labels = date_format("%b %Y")) +
    labs(x = NULL, 
         y = "Price (AUD/MWh)",
         title = paste0("Figure ", i, ": ", titles[i])) +
    theme_report_angled()
    
    assign(paste0("p", i), ggpl)
    i <- i + 1
}

# change plot dimensions 
options(repr.plot.width=10, repr.plot.height=5)
p1
p2
p3
# make a copy of the dataframe
df <- data.table::copy(df_orig)

# decompose the prices and identify outliers using the IQR
outlier_decomp <- df %>%
    model(stl = STL(price ~ season(period = 1), robust = TRUE)) %>% 
    components() %>% 
    filter(remainder < quantile(remainder, 0.25) - 3*IQR(remainder) | remainder > quantile(remainder, 0.75) + 3*IQR(remainder))

outlier_miss <- df %>% 
    anti_join(outlier_decomp, by = c("date", "price")) %>% 
    fill_gaps()

# replace the values with an ARIMA-based interpolation
outlier_fill <- outlier_miss %>% 
    model(ARIMA(price)) %>% 
    interpolate(outlier_miss)

df$adjprice = outlier_fill$price

price_comp <- bind_rows(summary(df$price), summary(df$adjprice)) %>% mutate(Series = c("Actual", "Adjusted"))