Drift Detection
🛰️ Drift Detectors
The drift-detection module is a lightweight, open-source Python toolkit
that tells STAMM — and the team — when the world has shifted under a
soft sensor's feet. It unifies a curated catalogue of detectors
behind a single DriftDetector.calculate() interface,
and ships self-describing metadata so dashboards and pipelines can
introspect the catalogue without special-casing.
Why this module exists
Most ML systems silently degrade as the world they were trained on changes — a phenomenon called drift. Detecting it is the cornerstone of responsible model maintenance, but the existing ecosystem is fragmented across libraries with very different APIs, dependency trees, and documentation quality. That fragmentation is especially costly in operating regimes where ground-truth labels arrive offline hours-to-days late — industrial soft sensors are the canonical example.
The drift_detectors_pack solves this for STAMM:
Every detector subclasses DriftDetector and exposes the same calculate() entry point — dashboards iterate over the catalogue without special-casing each method.
Every detector ships a metadata.yaml with its name, family, parameters, output schema, and references — so the STAMM dashboard can render the catalogue automatically.
The streaming detectors are pure NumPy — no transitive dependency on heavyweight frameworks. Small footprint, easy to ship to edge or low-resource deployments.
The three detector families
Drift problems arise in three different shapes in practice. The package mirrors that with three families, each living in its own subpackage.
Univariate
One variable at a time — distributional comparisons and sequential change-point detectors.
Multivariate
Joint distribution over the full feature matrix — kernel, projection, and partition-based methods.
Model-based
Compares predictions from multiple co-deployed soft sensors — the front-line signal when ground truth lags by days.
Detector catalogue
Tip: every detector carries the configured parameters, the mode (online / offline), and the reference and test sample sizes in its result's details dictionary. Detector-specific keys (p_value, gamma, warning, metric_means, pairwise, …) are documented in each detector's metadata.yaml.
Quick start
Install from PyPI (or directly from source):
pip install stamm-drift-detectors
Use any detector through the same interface:
import numpy as np
from drift_detectors import PSI, MMDDetector, ModelDisagreementMetric
# --- Univariate ---
ref = np.random.normal(0.0, 1.0, 1000)
test = np.random.normal(0.3, 1.0, 1000)
res = PSI().calculate(test, ref)
print(res.score, res.drift, res.details)
# --- Multivariate: kernel MMD with median-distance heuristic, ---
# --- recommended for unscaled industrial data ---
ref_mv = np.random.normal(0.0, 1.0, size=(500, 4))
test_mv = np.random.normal(0.4, 1.0, size=(500, 4))
print(MMDDetector(gamma="median").calculate(test_mv, ref_mv))
# --- Model-based: disagreement across two co-deployed soft sensors ---
mdm = ModelDisagreementMetric(
metrics=[MSEDisagreement(), PearsonDisagreement()],
threshold=0.25,
)
print(mdm.calculate(predictions=[y_linear, y_tree]).score)
Streaming detectors take one observation at a time:
from drift_detectors import PageHinkley, HDDM_A
ph = PageHinkley(online=True)
hd = HDDM_A(online=True)
for x in stream:
if ph.calculate([x]).drift: # cumulative-deviation change point
notify_dashboard("PageHinkley drift")
if hd.calculate([x]).drift: # Hoeffding-bound change point
notify_dashboard("HDDM-A drift")
How it plugs into STAMM
drift_detectors_pack is component (V) of the STAMM platform. It runs alongside the Workflow Orchestrator and the Model Registry, evaluating live inputs against the reference window each soft sensor was trained on. The drift scores it produces feed directly into the monitoring view of the Dashboard, where they sit next to model predictions and reference data.
The Orchestrator pulls reference and live snapshots from the Time-series DB and runs the configured detectors per soft sensor.
The Plotly dashboard renders univariate and multivariate drift indicators with the metadata each detector carries — no per-detector view code.
The ModelDisagreementMetric drives STAMM's model-divergence panel — surfacing when co-deployed soft sensors start to disagree.
The package is usable on its own — outside STAMM — anywhere drift detection on tabular or streaming data is needed.
The IndPenSim case study
The package ships a full reproduction of the experiments from the AI4D 2026 / CAEPIA 2026 companion paper at use_cases/IndPenSim/. The case study uses a 100-batch IndPenSim dataset and runs each detector independently in the four canonical fermentation phases (lag, log/exponential, stationary, death) — respecting the non-stationarity of fed-batch fermentation. Pure-NumPy reimplementations of the four interpretable soft sensors from Acosta-Pavas et al. (2024) — CART, M5, CUBIST, Random Forest — are included.
# 1. Fetch the 100-batch IndPenSim CSV (~21 MB)
python use_cases/IndPenSim/data/download_indpensim.py
# 2. Run the two paper experiments
python -m use_cases.IndPenSim.experiments.run_experiment_1
python -m use_cases.IndPenSim.experiments.run_experiment_2
# 3. (Optional) regenerate the paper figures
python use_cases/IndPenSim/figures/make_class_diagram.py
python use_cases/IndPenSim/figures/make_fault_timelines.py
Citing
If you use drift_detectors_pack in your work, please cite:
@inproceedings{corrales2026driftdetectors,
title = {drift_detectors_pack: An Open-Source Drift Detection Toolkit
for Soft-Sensor Monitoring in Industrial Bioprocesses},
author = {Corrales, David Camilo and Crowther, Matthew and Metcalfe, Brett and
Koehorst, Jasper J. and Su\'arez Mu\~noz, Carlos Alberto},
booktitle = {Proceedings of the 2nd Workshop on Artificial Intelligence for
Development (AI4D 2026), CAEPIA 2026},
year = {2026}
}