Cybersecurity Analytics for Critical Infrastructure

The Business Problem: Protecting the Grid from Evolving Threats

Utilities face increasing cybersecurity threats targeting both their IT systems and operational technology. Attacks on critical infrastructure can cause service disruptions, damage equipment, and even compromise public safety. High-profile incidents worldwide have demonstrated how vulnerable control systems and field devices can be to cyber intrusions.

Unlike traditional corporate IT environments, utilities operate industrial control systems with unique constraints. SCADA networks and field devices often run on legacy protocols with limited security features, making them attractive targets. Additionally, operational environments cannot tolerate frequent downtime, complicating the deployment of traditional IT security tools.

The consequences of a successful attack are severe. Malicious actors could manipulate breaker controls, disable protective relays, or disrupt market operations. The interconnected nature of modern grids amplifies the risk, as an attack on one utility can cascade to others. Regulators have responded with standards such as NERC CIP, but compliance alone is insufficient in an era of fast-moving and sophisticated threats.

The Analytics Solution: Data-Driven Intrusion Detection

Cybersecurity analytics uses machine learning to identify unusual network traffic, unauthorized access attempts, and other anomalies that may indicate an intrusion. Traditional signature-based detection systems struggle against novel attacks or insider threats that do not match known patterns.

Anomaly detection models are well suited to critical infrastructure environments. These models learn what normal operational behavior looks like, then flag deviations in real time. For example, they might detect unusual traffic patterns on SCADA networks or abnormal sequences of commands sent to field devices.

Supervised learning can also be applied where labeled attack data is available. Datasets such as CICIDS2017 provide examples of intrusion behaviors that can be used to train classification models capable of distinguishing legitimate activity from malicious actions. Combined with real-time monitoring, these models strengthen defenses against evolving threats.

Operational Benefits

Integrating analytics into cybersecurity provides several benefits. It enables faster detection of threats that evade conventional tools, reducing dwell time and limiting potential damage. By focusing alerts on high-risk anomalies, it reduces false positives and eases the burden on security operations centers.

These capabilities are especially valuable for utilities adopting more digital and distributed technologies. As advanced metering, DER integration, and remote monitoring expand the attack surface, analytics offers scalable ways to manage risk without overwhelming human analysts.

Transition to the Demo

In this chapter’s demo, we will work with network traffic data to:

Train an anomaly detection model to identify unusual patterns in operational network flows.
Apply a supervised classification model using labeled intrusion data to detect specific attack types.
Discuss how these models can be integrated into real-time monitoring environments to augment existing security tools.

By combining machine learning with cybersecurity practices, utilities can build smarter defenses tailored to their unique operational context and protect critical infrastructure from growing cyber threats.

pyfile shortcode: missing param 'file'. Example: {{< pyfile file="script.py" >}}

Code

"""
Chapter 13: Cybersecurity Analytics for Critical Infrastructure
Intrusion detection using anomaly detection (Isolation Forest) and supervised ML (Random Forest).
"""

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import IsolationForest, RandomForestClassifier
from sklearn.metrics import classification_report, roc_auc_score

def load_cicids_sample(file_path="data/CICIDS2017_sample.csv"):
    """
    Load a cleaned subset of the CICIDS2017 dataset.
    """
    df = pd.read_csv(file_path)
    print(f"Loaded dataset: {df.shape[0]} rows, {df.shape[1]} columns")
    return df

def preprocess(df):
    """
    Prepare data: scale features and encode labels (BENIGN=1, Attack=0).
    """
    X = df.drop(columns=["Label"])
    y = df["Label"].apply(lambda x: 1 if x == "BENIGN" else 0)
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    return X_scaled, y

def run_anomaly_detection(X, y):
    """
    Detect anomalies using Isolation Forest.
    """
    iso = IsolationForest(contamination=0.1, random_state=42)
    preds = iso.fit_predict(X)
    preds_binary = np.where(preds == 1, 1, 0)
    accuracy = (preds_binary == y).mean()
    print(f"Isolation Forest Accuracy: {accuracy:.2f}")

def run_supervised_detection(X, y):
    """
    Train Random Forest classifier for intrusion detection.
    """
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
    clf = RandomForestClassifier(n_estimators=150, random_state=42)
    clf.fit(X_train, y_train)
    preds = clf.predict(X_test)
    probs = clf.predict_proba(X_test)[:, 1]

    print("Random Forest Classification Report:")
    print(classification_report(y_test, preds, target_names=["Attack", "Benign"]))
    print(f"ROC AUC Score: {roc_auc_score(y_test, probs):.3f}")

if __name__ == "__main__":
    df = load_cicids_sample()
    X, y = preprocess(df)
    run_anomaly_detection(X, y)
    run_supervised_detection(X, y)