Drift Detection

Drift Detection · drift_detectors_pack

🛰️ Drift Detectors

The drift-detection module is a lightweight, open-source Python toolkit that tells STAMM — and the team — when the world has shifted under a soft sensor's feet. It unifies a curated catalogue of detectors behind a single DriftDetector.calculate() interface, and ships self-describing metadata so dashboards and pipelines can introspect the catalogue without special-casing.

Detectors in the catalogue

Families: univariate · multivariate · model-based

Py 3.10+

Pure-Python, dependency-frugal

v0.4.0

Latest release

Why this module exists

Most ML systems silently degrade as the world they were trained on changes — a phenomenon called drift. Detecting it is the cornerstone of responsible model maintenance, but the existing ecosystem is fragmented across libraries with very different APIs, dependency trees, and documentation quality. That fragmentation is especially costly in operating regimes where ground-truth labels arrive offline hours-to-days late — industrial soft sensors are the canonical example.

The drift_detectors_pack solves this for STAMM:

🔌

One interface

Every detector subclasses DriftDetector and exposes the same calculate() entry point — dashboards iterate over the catalogue without special-casing each method.

📜

Self-describing

Every detector ships a metadata.yaml with its name, family, parameters, output schema, and references — so the STAMM dashboard can render the catalogue automatically.

🪶

Dependency-frugal

The streaming detectors are pure NumPy — no transitive dependency on heavyweight frameworks. Small footprint, easy to ship to edge or low-resource deployments.

The three detector families

Drift problems arise in three different shapes in practice. The package mirrors that with three families, each living in its own subpackage.

📈

Univariate

One variable at a time — distributional comparisons and sequential change-point detectors.

PSIKSADWINPage-HinkleyHDDM-AEDDM

🌐

Multivariate

Joint distribution over the full feature matrix — kernel, projection, and partition-based methods.

MMDPCA-CDKDQ-Tree

🤖

Model-based

Compares predictions from multiple co-deployed soft sensors — the front-line signal when ground truth lags by days.

MDM

Detector catalogue

Name	Family	Mode	Reference
PSI	Univariate · distributional	batch	Wu & Olson, 2010
KS-test	Univariate · distributional	batch	Smirnov, 1948
ADWIN	Univariate · sequential	streaming	Bifet & Gavaldà, 2007
Page-Hinkley	Univariate · sequential	streaming	Page, 1954
HDDM-A	Univariate · sequential	streaming	Frías-Blanco et al., 2015
EDDM	Univariate · error-based	streaming	Baena-García et al., 2006
MMD	Multivariate · kernel	batch	Gretton et al., 2012
PCA-CD	Multivariate · projection	batch	Qahtan et al., 2015
KDQ-Tree	Multivariate · partition	batch	Dasu et al., 2006
MDM	Model-based · ensemble	batch	Suárez et al., 2026 (STAMM)

Tip: every detector carries the configured parameters, the mode (online / offline), and the reference and test sample sizes in its result's details dictionary. Detector-specific keys (p_value, gamma, warning, metric_means, pairwise, …) are documented in each detector's metadata.yaml.

Quick start

Install from PyPI (or directly from source):

pip install stamm-drift-detectors

Use any detector through the same interface:

import numpy as np
from drift_detectors import PSI, MMDDetector, ModelDisagreementMetric

# --- Univariate ---
ref  = np.random.normal(0.0, 1.0, 1000)
test = np.random.normal(0.3, 1.0, 1000)
res = PSI().calculate(test, ref)
print(res.score, res.drift, res.details)

# --- Multivariate: kernel MMD with median-distance heuristic, ---
# --- recommended for unscaled industrial data ---
ref_mv  = np.random.normal(0.0, 1.0, size=(500, 4))
test_mv = np.random.normal(0.4, 1.0, size=(500, 4))
print(MMDDetector(gamma="median").calculate(test_mv, ref_mv))

# --- Model-based: disagreement across two co-deployed soft sensors ---
mdm = ModelDisagreementMetric(
    metrics=[MSEDisagreement(), PearsonDisagreement()],
    threshold=0.25,
)
print(mdm.calculate(predictions=[y_linear, y_tree]).score)

Streaming detectors take one observation at a time:

from drift_detectors import PageHinkley, HDDM_A

ph = PageHinkley(online=True)
hd = HDDM_A(online=True)

for x in stream:
    if ph.calculate([x]).drift:    # cumulative-deviation change point
        notify_dashboard("PageHinkley drift")
    if hd.calculate([x]).drift:    # Hoeffding-bound change point
        notify_dashboard("HDDM-A drift")

How it plugs into STAMM

drift_detectors_pack is component (V) of the STAMM platform. It runs alongside the Workflow Orchestrator and the Model Registry, evaluating live inputs against the reference window each soft sensor was trained on. The drift scores it produces feed directly into the monitoring view of the Dashboard, where they sit next to model predictions and reference data.

Live monitoring

The Orchestrator pulls reference and live snapshots from the Time-series DB and runs the configured detectors per soft sensor.

Dashboard panel

The Plotly dashboard renders univariate and multivariate drift indicators with the metadata each detector carries — no per-detector view code.

Model divergence

The ModelDisagreementMetric drives STAMM's model-divergence panel — surfacing when co-deployed soft sensors start to disagree.

Stand-alone too

The package is usable on its own — outside STAMM — anywhere drift detection on tabular or streaming data is needed.

The IndPenSim case study

The package ships a full reproduction of the experiments from the AI4D 2026 / CAEPIA 2026 companion paper at use_cases/IndPenSim/. The case study uses a 100-batch IndPenSim dataset and runs each detector independently in the four canonical fermentation phases (lag, log/exponential, stationary, death) — respecting the non-stationarity of fed-batch fermentation. Pure-NumPy reimplementations of the four interpretable soft sensors from Acosta-Pavas et al. (2024) — CART, M5, CUBIST, Random Forest — are included.

# 1. Fetch the 100-batch IndPenSim CSV (~21 MB)
python use_cases/IndPenSim/data/download_indpensim.py

# 2. Run the two paper experiments
python -m use_cases.IndPenSim.experiments.run_experiment_1
python -m use_cases.IndPenSim.experiments.run_experiment_2

# 3. (Optional) regenerate the paper figures
python use_cases/IndPenSim/figures/make_class_diagram.py
python use_cases/IndPenSim/figures/make_fault_timelines.py

Citing

If you use drift_detectors_pack in your work, please cite:

@inproceedings{corrales2026driftdetectors,
  title     = {drift_detectors_pack: An Open-Source Drift Detection Toolkit
               for Soft-Sensor Monitoring in Industrial Bioprocesses},
  author    = {Corrales, David Camilo and Crowther, Matthew and Metcalfe, Brett and
               Koehorst, Jasper J. and Su\'arez Mu\~noz, Carlos Alberto},
  booktitle = {Proceedings of the 2nd Workshop on Artificial Intelligence for
               Development (AI4D 2026), CAEPIA 2026},
  year      = {2026}
}

🛰️

Explore the code & contributeApache 2.0 licensed. Issues, pull requests, and new detectors welcome — the architecture is designed so adding a detector is a one-folder change.

View on GitHub →

🛰️ Drift Detectors

Why this module exists​

The three detector families​

Univariate

Multivariate

Model-based

Detector catalogue​

Quick start​

How it plugs into STAMM​

The IndPenSim case study​

Citing​