Predictive Maintenance for Grid Assets

The Business Problem: The Cost of Unplanned Failures

Utility infrastructure is aging. Many transformers, breakers, and other critical components have been in service for decades, operating far beyond their original design lifespans. Routine maintenance schedules and periodic inspections have historically been the backbone of asset management. Crews replace components on fixed timelines or after visible deterioration, and major failures are often addressed reactively when they occur.

This approach is costly and inefficient. Unplanned failures disrupt service, trigger expensive emergency repairs, and can cascade into larger outages affecting thousands of customers. Each transformer failure or feeder trip carries not only repair costs but penalties for reliability metrics like SAIDI and SAIFI, as well as reputational damage. Worse, utilities often replace equipment too early, discarding assets that still have years of useful life because their age meets replacement criteria.

The challenge lies in predicting failures before they happen and targeting maintenance only where it is needed. Doing so requires leveraging data from sensors, inspection logs, and operational histories to assess asset health dynamically, rather than relying on static schedules. This is where predictive maintenance comes in.

The Analytics Solution: From Reactive to Predictive Maintenance

Predictive maintenance uses data-driven analytics to estimate an asset’s remaining useful life and detect signs of impending failure. Rather than replacing equipment based purely on age or fixed intervals, utilities can prioritize interventions based on condition. This minimizes both premature replacements and catastrophic failures.

SCADA systems already collect equipment-level telemetry such as transformer loading, oil temperature, and breaker operations. Modern deployments add IoT sensors measuring vibration, dissolved gas analysis (DGA), or partial discharge activity, each providing a window into asset health. Combined with asset registries tracking manufacturer, installation date, and maintenance history, these data sources feed machine learning models that predict failure risk.

One common approach is classification modeling: assets are labeled as failed or healthy based on historical outcomes, and the model learns to distinguish between them. Another is anomaly detection, flagging equipment whose sensor readings deviate from normal operating patterns. Time series models can also be used to identify slow degradation trends or predict remaining useful life.

These techniques transform maintenance from a calendar-driven process into a data-driven one, aligning scarce field resources with the equipment most in need of attention.

Real-World Impact

The business value of predictive maintenance is tangible. Reducing unplanned outages improves reliability metrics and customer satisfaction while lowering regulatory penalties. Targeted interventions reduce maintenance costs by avoiding unnecessary replacements and emergency overtime. Predicting transformer failures before they occur prevents feeder trips and associated cascading effects.

Moreover, predictive maintenance supports capital planning. Utilities can use risk scores to defer non-critical replacements, extending asset lifespans safely and freeing up budget for more urgent needs. Insights also inform procurement, as failure patterns may reveal manufacturer-specific issues or environmental factors affecting equipment longevity.

In practice, predictive maintenance often operates alongside preventive strategies. Routine inspections remain necessary, but they are augmented by analytics that direct crews toward equipment showing early warning signs. This hybrid approach maximizes the value of existing programs while gradually modernizing asset management.

Transition to the Demo

In this chapter’s demo, we will build a simplified predictive maintenance pipeline. Using simulated SCADA and asset data, we will:

Create a classification model that predicts transformer failure risk based on operating conditions such as temperature, vibration, and age.
Apply anomaly detection to flag equipment deviating from normal behavior, even without explicit failure labels.
Visualize how these predictions generate actionable asset risk rankings for maintenance prioritization.

By combining sensor data, asset attributes, and basic machine learning techniques, we will replicate the core workflow utilities use to modernize their maintenance programs. This exercise demonstrates how analytics can transform maintenance from reactive firefighting into proactive, targeted intervention, reducing costs and improving reliability in measurable ways.

pyfile shortcode: missing param 'file'. Example: {{< pyfile file="script.py" >}}

Code

"""
Chapter 5: Predictive Maintenance for Grid and Plant Assets
Uses synthetic SCADA sensor data to detect anomalies and predict equipment failures.
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest, RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.preprocessing import StandardScaler

def generate_synthetic_scada_data(samples=2000):
    """
    Generate synthetic SCADA data for transformer monitoring.
    Features: temperature, vibration, oil pressure, load.
    """
    np.random.seed(42)
    temp = np.random.normal(60, 5, samples)
    vibration = np.random.normal(0.2, 0.05, samples)
    oil_pressure = np.random.normal(25, 3, samples)
    load = np.random.normal(800, 100, samples)

    # Simulate failures: elevated temp/vibration correlated with failures
    failure_prob = 1 / (1 + np.exp(-(0.05*(temp-65) + 8*(vibration-0.25))))
    failures = np.random.binomial(1, failure_prob)

    return pd.DataFrame({
        "Temperature_C": temp,
        "Vibration_g": vibration,
        "OilPressure_psi": oil_pressure,
        "Load_kVA": load,
        "Failure": failures
    })

def plot_sensor_trends(df):
    """
    Plot sample SCADA signals over time.
    """
    plt.figure(figsize=(12, 5))
    plt.plot(df.index[:200], df["Temperature_C"][:200], label="Temperature (C)", color="black")
    plt.plot(df.index[:200], df["Vibration_g"][:200], label="Vibration (g)", color="gray")
    plt.xlabel("Time (Sample Index)")
    plt.ylabel("Sensor Readings")
    plt.title("Transformer Sensor Trends (Sample Window)")
    plt.legend()
    plt.tight_layout()
    plt.savefig("chapter5_sensor_trends.png")
    plt.show()

def anomaly_detection(df):
    """
    Apply Isolation Forest for anomaly detection.
    """
    features = df[["Temperature_C", "Vibration_g", "OilPressure_psi", "Load_kVA"]]
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(features)

    model = IsolationForest(contamination=0.05, random_state=42)
    preds = model.fit_predict(X_scaled)
    anomalies = np.where(preds == -1)[0]

    plt.figure(figsize=(10, 4))
    plt.scatter(df.index, df["Temperature_C"], c="black", label="Normal", alpha=0.5)
    plt.scatter(df.index[anomalies], df["Temperature_C"].iloc[anomalies], c="red", label="Anomaly")
    plt.xlabel("Sample")
    plt.ylabel("Temperature (C)")
    plt.title("Anomaly Detection in Transformer Sensor Data")
    plt.legend()
    plt.tight_layout()
    plt.savefig("chapter5_anomaly.png")
    plt.show()

def failure_prediction(df):
    """
    Train Random Forest to classify failure risk.
    """
    X = df[["Temperature_C", "Vibration_g", "OilPressure_psi", "Load_kVA"]]
    y = df["Failure"]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_prob = model.predict_proba(X_test)[:, 1]

    print("Failure Prediction Report:")
    print(classification_report(y_test, y_pred, target_names=["Healthy", "Failure"]))
    print(f"ROC AUC Score: {roc_auc_score(y_test, y_prob):.3f}")

if __name__ == "__main__":
    df_scada = generate_synthetic_scada_data()
    plot_sensor_trends(df_scada)
    anomaly_detection(df_scada)
    failure_prediction(df_scada)