Skip to main content

Service Orchestrator

Orchestrator · Apache Airflow

⚙️ Workflow Orchestrator

The operational engine that coordinates data acquisition, model execution, and prediction storage in the soft-sensor workflow — reproducibly, event-driven, end-to-end.

⚡ View on GitHub
Airflow 3.0.6
Official Docker image, Python 3.12.11
STAMM_DAG
Single DAG orchestrating the workflow
4 steps
Health → Detect → Infer → Store
REST
Models served remotely, never installed locally

What it does

STAMM uses Apache Airflow 3.0.6 (Python 3.12.11) in its official Docker image as the operational engine of the soft-sensor workflow. The time-series database holds raw measurements, metadata, and model predictions; Airflow makes sure these heterogeneous data flows run in a temporally consistent and fully automated way. A dedicated DAG (STAMM_DAG) drives the process in real time.

The four DAG steps

01

🩺Health Check

Confirms connectivity with the time-series database and ensures the main buckets — stamm_raw, stamm_predictions, stamm_metadata — are available before anything else runs.

02

📡Data Detection

Monitors stamm_raw for new measurements and assembles wide-format snapshots that capture the latest process state — already shaped for ML model input.

03

🗂️Model Inference

Airflow calls the Model Registry via REST endpoints to score the snapshot — no model code is installed on the platform. Inference happens where the model lives.

04

💾Prediction Storage

Predictions land in stamm_predictions, preserving the original snapshot timestamp and linking each output to the experimental observation that produced it. Every entry carries full metadata — model ID, version, source, and predicted property — so cross-bucket queries and dashboards stay coherent.

Why it matters

🔁
ReproducibleThe DAG is declarative — every run produces the same outputs given the same inputs.
Event-drivenReacts to new measurements instead of polling — soft sensors stay in step with the process.
🧩
ExtensibleThe Registry, database, and DAG are decoupled — swap any of them without rewiring the others.

This integration of database, Model Registry, and workflow orchestration gives STAMM a reproducible, event-driven, and extensible execution layer. Soft-sensor models stay synchronized with data availability, version-controlled through the Model Registry, and seamlessly integrated into the unified time-series schema.