π’οΈ Time-series database
STAMM's data layer is a dedicated time-series store that organizes and serves process information, sensor readings, actuator data, and soft-sensor predictions. It is the single source of truth every other module reads from β the Orchestrator, the Model Registry, the Drift Detectors, and the Dashboards.
The database is intentionally pluggable. STAMM ships with a stable InfluxDB backend that powers the current deployments, and a newer PostgreSQL adapter that opens the door to other time-series engines without changing how the rest of STAMM behaves.
InfluxDB 2.7
Production-tested, used in every STAMM deployment today. Buckets keep raw signals, predictions, and metadata cleanly separated.
PostgreSQL adapter
A new extension layer that lets STAMM talk to PostgreSQL β and, by design, to other time-series databases in the future β through a common interface.
π§± Data organizationβ
All data are structured around three logical collections, ensuring clear separation between raw process signals, model outputs, and metadata while maintaining a consistent schema for easy integration and analysis. In the InfluxDB backend these collections are dedicated buckets; in the PostgreSQL adapter they are dedicated schemas/tables with equivalent semantics.
- Raw Data (
stamm_raw) β collects real-time measurements from sensors, actuators, and control systems. Represents direct observations from the physical process, such as temperature, pH, agitation speed, or flow rate. Each entry includes contextual tags like device ID, project name, and batch ID, enabling detailed traceability across experiments and production runs. - Model Predictions (
stamm_predictions) β stores outputs from machine-learningβbased soft sensors: estimated variables and inferred values that are not directly measurable. Each prediction is linked to its corresponding model identifier and version, allowing model performance tracking and comparison over time. The structure mirrors that of the raw data, making it easy to align predictions with real measurements. - Metadata (
stamm_metadata) β contains descriptive information for all observed properties, such as engineering units, display names, and preferred precision. Acts as a central registry ensuring that every variable is consistently represented across dashboards and analytical tools.
π Unified data modelβ
All three collections share a common measurement name (bioreactor_obs) and
consistent tagging. This unified structure enables seamless queries and
comparisons between real data, model predictions, and metadata β supporting
cross-collection analytics for monitoring, diagnostics, and soft-sensor
validation.
The shared schema is what makes the backend pluggable: as long as a new adapter implements the same logical collections, tags, and measurement naming, the rest of STAMM doesn't need to change.
π Extension layerβ
The PostgreSQL adapter is the first concrete step toward making STAMM's data layer backend-agnostic. Instead of every module talking directly to InfluxDB, modules now interact with the database through a thin extension interface. That gives operators two practical benefits:
- Bring your own time-series engine. Sites already running PostgreSQL (often with TimescaleDB or a similar extension) can deploy STAMM without introducing a second database alongside their existing infrastructure.
- Future backends without rewrites. As more time-series engines become relevant β cloud-native stores, OLAP columnar databases β they can be plugged in by implementing the same adapter contract.
InfluxDB remains the default and is fully supported. The PostgreSQL adapter is under active development; check the repository for the latest status.