Load Forecasting and Demand Analytics

What You’ll Learn

By the end of this chapter, you will understand why accurate load forecasting is critical for grid operations and market participation. You’ll learn ARIMA time series modeling for capturing autoregressive patterns in demand. You’ll see how LSTM neural networks can capture nonlinear relationships in load data. You’ll evaluate forecast accuracy using appropriate metrics like RMSE and MAPE, and you’ll recognize the operational impact of forecast errors and how ML reduces them.


The Business Problem: Balancing Supply and Demand in Real Time

Electric power systems are unique among industries in that supply and demand must remain balanced continuously. Electricity is consumed the instant it is produced, and large-scale storage remains limited. This makes accurate demand forecasting central to every decision a utility makes.

If forecasts overestimate demand, utilities commit more generation than needed, incurring unnecessary costs and running plants inefficiently. If forecasts underestimate demand, the grid faces shortages, risking frequency drops, emergency dispatch of expensive peaking units, or even load shedding. This balancing act is complicated by rising demand volatility: distributed solar generation shifts net load profiles, electric vehicles create new evening peaks, and extreme weather events drive sudden spikes in heating and cooling loads.

A utility I worked with had a forecast error that cost them $50,000 in a single day because they had to dispatch expensive peaking units when they didn’t need to. That’s real money, and it happens more often than you’d think.

Historically, utilities relied on deterministic or statistical models built on historical patterns. These methods assumed that tomorrow would look much like yesterday, adjusted for seasonal effects and known events. That assumption no longer holds in an era of climate-driven weather extremes, rapid electrification, and high DER penetration. The cost of forecast errors is increasing, not just financially but in terms of reliability and customer trust.


The Analytics Solution: Data-Driven Load Forecasting

Modern load forecasting applies machine learning to capture complex relationships between weather, calendar effects, and consumption behavior. Temperature remains the single largest driver of demand in most regions, but other factors—humidity, wind, cloud cover, time of day, day of week, and holidays—play significant roles. For grids with significant rooftop solar, net load must account for both consumption and behind-the-meter generation.

Short-term forecasts (minutes to hours ahead) help operators adjust dispatch and manage ramping constraints. Day-ahead forecasts guide market bids and generator commitments. Long-term forecasts inform capital planning, feeder upgrades, and rate design. Each horizon demands different models and data granularity, but they share a common goal: reduce uncertainty and enable better operational and financial decisions.

Machine learning enhances forecasting by learning nonlinear interactions. For example, demand rises sharply once temperatures cross certain thresholds, reflecting air conditioning saturation effects. Neural networks can capture such nonlinearities better than traditional linear regression. Time series models like ARIMA or SARIMA remain valuable for capturing autoregressive patterns, while ensemble methods blend multiple models to improve accuracy.


Understanding ARIMA Models

ARIMA (AutoRegressive Integrated Moving Average) captures temporal patterns in time series. The AutoRegressive component means current values depend on previous values. The Integrated component uses differencing to make the series stationary. The Moving Average component means current values depend on previous forecast errors.

ARIMA models are specified as ARIMA(p, d, q) where p is autoregressive terms, d is differencing, and q is moving average terms. For load forecasting, ARIMA models excel at capturing daily and weekly patterns. The code uses ARIMA(3, 1, 2).


Linking Forecasting to Utility Operations

Accurate forecasts drive every part of utility operations. For system operators, improved short-term forecasts reduce reliance on expensive reserves and minimize frequency excursions. For distribution planners, neighborhood-level forecasts flag feeders at risk of overload as electric vehicle adoption accelerates. For energy traders, better day-ahead forecasts sharpen market positions, reducing exposure to price volatility.

Even customer-facing programs rely on forecasting. Demand response events require accurate predictions of when peak conditions will occur to maximize participation and avoid unnecessary interruptions. Time-of-use rates depend on understanding how customers shift load in response to pricing signals.

Load forecasting is also intertwined with renewable integration. As rooftop solar expands, midday net load dips, while steep evening ramps strain peaking plants. Forecasting both gross load and distributed generation is critical to managing this two-sided variability. Utilities that fail to adapt risk both operational stress and financial penalties in competitive markets.

Xcel Energy’s load forecasting team had the same problem a lot of utilities face: they could run their forecasts maybe once a year, and the whole process was a mess—spreadsheets, disparate systems, desktop tools, third-party vendors, data flowing in and out manually. Their demand forecasting feeds into long-range resource plans that they have to submit to regulators, so accuracy and detail matter.

Here’s what changed: they migrated to a modern data platform and got access to granular smart meter data from about 4 million meters. Suddenly they could work with feeder-level data, not just aggregated numbers. The combination of that granularity plus the compute power to actually process it has been transformative. Cindy Hoffman, their Director of Enterprise Data Strategy & AI, told me they started it as a proof of concept just to see if it would work, and it’s proving out. They’re building distribution forecasts that feed into load forecasting, which feeds into demand forecasting, which feeds into resource plans. The predictive models they’re putting in place now work at a level of detail and granularity that wasn’t feasible before. Some of their service territories are seeing massive data center expansion and rapid load growth, so this capability is critical for their regulatory filings.


Forecast Accuracy Metrics

Before building models, it’s important to understand how we measure forecast quality. RMSE measures average prediction error magnitude. MAPE shows error as a percentage of actual values, useful for comparing forecasts across different load levels. MAE is less sensitive to outliers than RMSE. Forecast uncertainty is captured through confidence intervals—wider intervals indicate higher uncertainty.


Building Load Forecasting Models

Let’s walk through two complementary approaches to load forecasting: ARIMA for capturing temporal patterns and LSTM neural networks for learning complex nonlinear relationships. Both methods are essential tools in modern utility forecasting.

First, we generate synthetic load data:

def generate_synthetic_load():
    """Generate synthetic hourly load data for one year."""
    date_rng = pd.date_range(
        start=config["data"]["start_date"],
        periods=config["data"]["periods"],
        freq="h"
    )
    base = config["data"]["base_load"]
    seasonal = config["data"]["seasonal_amplitude"]
    daily = config["data"]["daily_cycle_amplitude"]
    noise_std = config["data"]["noise_std"]
    
    base_load = base + seasonal * np.sin(2 * np.pi * date_rng.dayofyear / 365)
    daily_cycle = daily * np.sin(2 * np.pi * date_rng.hour / 24)
    noise = np.random.normal(0, noise_std, len(date_rng))
    load = base_load + daily_cycle + noise
    
    return pd.DataFrame({"timestamp": date_rng, "Load_MW": load})

This creates one year of hourly load data with seasonal patterns, daily cycles, and realistic noise. The data simulates what utilities collect from their systems, with clear daily cycles (higher during day, lower at night) and seasonal patterns (summer peaks for cooling, winter peaks for heating). In practice, you’d pull this from your SCADA or AMI systems.

Next, let’s visualize what we’re working with:

def plot_load(df):
    """Plot hourly load."""
    fig, ax = plt.subplots(figsize=config["plotting"]["figsize"])
    ax.plot(df["timestamp"], df["Load_MW"], color=config["plotting"]["colors"]["load"])
    ax.set_title("Hourly Load (MW) Over One Year")
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    plt.tight_layout()
    plt.savefig(config["plotting"]["output_files"]["load"])
    plt.close()

This plot helps identify data quality issues and understand load patterns before modeling. Always inspect your data visually before building models. I’ve seen forecasts fail because someone didn’t notice missing data or outliers that broke the model.

Now we apply ARIMA forecasting:

def arima_forecast(df):
    """Forecast using ARIMA."""
    ts = df.set_index("timestamp")["Load_MW"]
    order = tuple(config["arima"]["order"])
    model = ARIMA(ts, order=order)
    fit = model.fit()
    forecast = fit.forecast(steps=config["arima"]["forecast_steps"])

    fig, ax = plt.subplots(figsize=config["plotting"]["figsize"])
    ax.plot(ts[-config["arima"]["forecast_steps"]:], 
             label="Observed", color=config["plotting"]["colors"]["observed"])
    ax.plot(forecast, label="ARIMA Forecast", 
             color=config["plotting"]["colors"]["forecast"])
    ax.set_title("Load (MW) - ARIMA Forecast vs Observed")
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.legend()
    plt.tight_layout()
    plt.savefig(config["plotting"]["output_files"]["arima"])
    plt.close()

ARIMA captures autoregressive patterns—how past load values influence future demand. The forecast plot displays the last week of observed data alongside the one-week forecast. A good forecast should follow the general trend and capture daily cycles. ARIMA is my go-to for day-ahead forecasts because it’s interpretable and reliable.

Finally, we demonstrate LSTM forecasting (if Darts is available):

def lstm_forecast(df):
    """Forecast using LSTM (Darts)."""
    if not HAS_DARTS:
        print("Darts not available; skipping LSTM forecast.")
        return
    
    df = df.copy()
    df["Load_MW"] = df["Load_MW"].astype(np.float32)
    series = TimeSeries.from_dataframe(df, "timestamp", "Load_MW")
    train, val = series[:-config["arima"]["forecast_steps"]], series[-config["arima"]["forecast_steps"]:]
    
    model = RNNModel(
        model="LSTM",
        input_chunk_length=config["lstm"]["input_chunk_length"],
        output_chunk_length=config["lstm"]["output_chunk_length"],
        training_length=config["lstm"]["training_length"],
        n_epochs=config["lstm"]["n_epochs"],
        random_state=config["model"]["random_state"],
        pl_trainer_kwargs={"accelerator": "cpu", "devices": 1, "enable_progress_bar": False},
    )
    model.fit(train)
    pred = model.predict(len(val))

    fig, ax = plt.subplots(figsize=config["plotting"]["figsize"])
    train[-config["arima"]["forecast_steps"]:].plot(
        ax=ax, label="Observed", lw=1, color=config["plotting"]["colors"]["observed"])
    pred.plot(ax=ax, label="LSTM Forecast", lw=2, color=config["plotting"]["colors"]["forecast"])
    ax.set_title(f"Load (MW) - LSTM Forecast vs Observed (RMSE: {rmse(val, pred):.2f})")
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.legend()
    plt.tight_layout()
    plt.savefig(config["plotting"]["output_files"]["lstm"])
    plt.close()

LSTMs can capture nonlinear relationships and long-term dependencies that ARIMA might miss. The RMSE score in the title quantifies forecast accuracy—lower values indicate better predictions. In production, utilities track multiple metrics over time to monitor forecast quality and detect model drift. I’ve seen LSTMs outperform ARIMA by 2-3% MAPE, but they’re harder to interpret and require more data.

The complete, runnable script is at content/c4/load_forecasting.py. Try both approaches and see which works better for your use case.


What I Want You to Remember

Forecast accuracy directly impacts operations. Even small improvements in forecast error, such as a 1-2% MAPE reduction, translate to significant cost savings through better generation scheduling and reduced reserve requirements. I’ve seen utilities save hundreds of thousands of dollars annually from just a 1% improvement in forecast accuracy. Multiple models serve different purposes. ARIMA excels at capturing temporal patterns and is interpretable. LSTMs capture complex nonlinearities but require more data and computational resources.

Time series structure matters. Load data has strong daily and weekly patterns. Models that ignore these patterns, such as simple regression, will underperform. Forecast horizons require different approaches. Short-term forecasts measured in hours can use recent patterns. Day-ahead forecasts need weather inputs. Long-term forecasts require trend and growth modeling.

Uncertainty quantification is critical. Point forecasts alone aren’t enough. Operators need confidence intervals to plan reserves and manage risk. I learned this from my ExxonMobil project: the forecast was right, but I didn’t communicate the uncertainty well enough.


What’s Next

In Chapter 5, we’ll shift from forecasting to predictive maintenance—using ML to anticipate equipment failures before they cause outages. You’ll learn how classification and anomaly detection help utilities prioritize maintenance resources. The principles are similar, but the use case is different.


Code

"""Chapter 4: Load Forecasting and Demand Analytics."""

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yaml
from pathlib import Path
from statsmodels.tsa.arima.model import ARIMA

# Optional: Darts dependencies
HAS_DARTS = True
try:
    from darts import TimeSeries
    from darts.models import RNNModel
    from darts.metrics import rmse
except Exception:
    HAS_DARTS = False

# Load config
config_path = Path(__file__).parent / "config.yaml"
with open(config_path) as f:
    config = yaml.safe_load(f)

np.random.seed(config["model"]["random_state"])


def generate_synthetic_load():
    """Generate synthetic hourly load data for one year."""
    date_rng = pd.date_range(
        start=config["data"]["start_date"],
        periods=config["data"]["periods"],
        freq="h"
    )
    base = config["data"]["base_load"]
    seasonal = config["data"]["seasonal_amplitude"]
    daily = config["data"]["daily_cycle_amplitude"]
    noise_std = config["data"]["noise_std"]
    
    base_load = base + seasonal * np.sin(2 * np.pi * date_rng.dayofyear / 365)
    daily_cycle = daily * np.sin(2 * np.pi * date_rng.hour / 24)
    noise = np.random.normal(0, noise_std, len(date_rng))
    load = base_load + daily_cycle + noise
    
    return pd.DataFrame({"timestamp": date_rng, "Load_MW": load})


def plot_load(df):
    """Plot hourly load."""
    fig, ax = plt.subplots(figsize=config["plotting"]["figsize"])
    ax.plot(df["timestamp"], df["Load_MW"], color=config["plotting"]["colors"]["load"])
    ax.set_title("Hourly Load (MW) Over One Year")
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    plt.tight_layout()
    plt.savefig(config["plotting"]["output_files"]["load"])
    plt.close()


def arima_forecast(df):
    """Forecast using ARIMA."""
    ts = df.set_index("timestamp")["Load_MW"]
    order = tuple(config["arima"]["order"])
    model = ARIMA(ts, order=order)
    fit = model.fit()
    forecast = fit.forecast(steps=config["arima"]["forecast_steps"])

    fig, ax = plt.subplots(figsize=config["plotting"]["figsize"])
    ax.plot(ts[-config["arima"]["forecast_steps"]:], 
             label="Observed", color=config["plotting"]["colors"]["observed"])
    ax.plot(forecast, label="ARIMA Forecast", 
             color=config["plotting"]["colors"]["forecast"])
    ax.set_title("Load (MW) - ARIMA Forecast vs Observed")
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.legend()
    plt.tight_layout()
    plt.savefig(config["plotting"]["output_files"]["arima"])
    plt.close()


def lstm_forecast(df):
    """Forecast using LSTM (Darts)."""
    if not HAS_DARTS:
        print("Darts not available; skipping LSTM forecast.")
        return
    
    df = df.copy()
    df["Load_MW"] = df["Load_MW"].astype(np.float32)
    series = TimeSeries.from_dataframe(df, "timestamp", "Load_MW")
    train, val = series[:-config["arima"]["forecast_steps"]], series[-config["arima"]["forecast_steps"]:]
    
    model = RNNModel(
        model="LSTM",
        input_chunk_length=config["lstm"]["input_chunk_length"],
        output_chunk_length=config["lstm"]["output_chunk_length"],
        training_length=config["lstm"]["training_length"],
        n_epochs=config["lstm"]["n_epochs"],
        random_state=config["model"]["random_state"],
        pl_trainer_kwargs={"accelerator": "cpu", "devices": 1, "enable_progress_bar": False},
    )
    model.fit(train)
    pred = model.predict(len(val))

    fig, ax = plt.subplots(figsize=config["plotting"]["figsize"])
    train[-config["arima"]["forecast_steps"]:].plot(
        ax=ax, label="Observed", lw=1, color=config["plotting"]["colors"]["observed"])
    pred.plot(ax=ax, label="LSTM Forecast", lw=2, color=config["plotting"]["colors"]["forecast"])
    ax.set_title(f"Load (MW) - LSTM Forecast vs Observed (RMSE: {rmse(val, pred):.2f})")
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.legend()
    plt.tight_layout()
    plt.savefig(config["plotting"]["output_files"]["lstm"])
    plt.close()


if __name__ == "__main__":
    df_load = generate_synthetic_load()
    plot_load(df_load)
    arima_forecast(df_load)
    lstm_forecast(df_load)