Model Registry

Model Registry · Storage · Versioning

🗂️ Model Registry

The home for every soft sensor STAMM manages — storage, versioning, and lifecycle management with rich, structured metadata describing each model from identification to inputs and outputs.

⚡ View on GitHub

Python · R

Both ecosystems supported

YAML

Declarative metadata per model

REST

Every model is an HTTP endpoint

Streamlit

Web UI to explore the registry

What the registry does

The Model Registry is the core component responsible for the storage, versioning, and lifecycle management of every deployed soft sensor in STAMM. It supports models developed in both Python and R, enabling smooth integration across diverse machine-learning workflows. Each model is stored alongside its configuration, artifacts, and validation results — a structured foundation for consistent and reliable deployment across industrial systems.

🗄️

Store

Model artifacts, scalers, and configuration files stored together — never separated from the metadata that describes them.

🔢

Version

Each retrain becomes a tracked version with its own metadata snapshot — no silent overwrites, every lineage is preserved.

🛰️

Serve

Every registered model is exposed as a REST endpoint — the Orchestrator, Drift Detectors, and Dashboard call them over HTTP, never installing the model code.

What every model carries

The registry captures rich metadata that fully describes each soft sensor — from identification (name, version, author, creation date) to technical specifications such as the learner type, model architecture, training parameters, and input–output variables. For instance, an LSTM-based soft sensor trained with TensorFlow and Scikit-learn comes with its layers, optimizer, loss function, and feature scaling fully declared.

🪪 IdentificationName · version · ID · UUID · author · DOI · creation date · status

🧠 DescriptionLearner type · architecture name · runtime language · required packages

🏗️ ArchitectureLayers · units · activations · dropout · batch normalization

🎯 TrainingHyperparameters · optimizer · loss · callbacks · validation · experiment IDs

📥 InputsPer-feature: type · units · expected range · lag · scaling

📤 OutputsPredicted property · units · forecast horizon · expected range · scaler

📘 View example metadata.yaml (LSTM soft sensor)

ml_model_configuration:
  model_identification:
    name: "LSTM"
    version: "V.1.0"
    ID: "0009_[Python]_penicillin_LSTM"
    UUID: ""
    author: "Suarez, C., Astudillo A., Metcalfe B., Koehorst J.J., Castillo E. & Corrales, D. C"
    doi: ""
    creation_date: "17-02-2025"
    project: "../project_info.yaml"
    status: "online"
    status_description: "Model is loaded and ready for predictions."

  model_description:
    learner: "Long short-term memory (LSTM)"
    model_type: "black box"
    model_name: "LSTM without lag features [Python version]"
    description: "This is not the soft sensor using the LSTM with lag features proposed in the paper (Metcalfe et al., 2025)."
    language:
      - name: "python"
        version: "3.12.7"
    config_files:
      model_file: "0009_[Python]_penicillin_LSTM.keras"
    packages:
      - package: "tensorflow"
        version: "2.18.0"
        classes:
          - "keras.layers.LSTM"
          - "keras.layers.Dropout"
          - "keras.layers.BatchNormalization"
          - "keras.layers.Dense"
          - "keras.models.Sequential"
          - "keras.callbacks.EarlyStopping"
          - "keras.optimizers.Adam"
      - package: "sklearn"
        version: "1.5.2"
        classes:
          - "preprocessing.MinMaxScaler"

    input_time_interval:
      time_interval:
        value: 12
        unit: "minutes"
      aggregation:
        method: "NaN"
        description: "NaN"
      description: "One measurement every 12 minutes"

  model_architecture:
    input_layer:
      description: "The input layer consists of time-series data with each sample having 1 time step and features corresponding to the process variables."
      shape: "(None, 1, 8)"
    lstm:
      - layer: 1
        units: 256
        activation_function: "ReLU"
        description: "The first LSTM layer has 256 units with ReLU activation. This layer processes the time-series data and extracts temporal features."
      - layer: 2
        units: 256
        activation_function: "ReLU"
        description: "The second LSTM layer has 256 units with ReLU activation. It refines the temporal features and passes them to the next layer."
    layers:
      - layer: 3
        name: "dropout"
        rate: 0.2
        description: "A dropout layer with a rate of 0.2 is applied after the LSTM layers to mitigate overfitting."
      - layer: 4
        name: "batch_normalization"
        description: "Batch normalization is applied to stabilize and accelerate the training process by normalizing the activations of the previous layers."
      - layer: 5
        name: "dense"
        units: 256
        activation_function: "ReLU"
        description: "The dense layer with 256 units and ReLU activation is used for further processing before the output layer."
    output_layer:
      activation_function: "ReLU"
      description: "The output layer consists of a single neuron with ReLU activation, predicting the penicillin concentration."
      shape: "(None, 1)"

  training_information:
    number_of_instances: 89800
    hyperparameters:
      batch_size: 512
      epochs: 100
      optimizer:
        name: "Adam"
        learning_rate: 0.001
        description: "The Adam optimizer is used for training."
      loss_function:
        name: "Mean Squared Error (MSE)"
        description: "MSE is used as the loss function to minimize the squared differences between predicted and actual values of penicillin concentration."
      callbacks:
        early_stopping:
          monitor: "val_loss"
          patience: 10
          description: "Early stopping is used to halt training when the validation loss stops improving for 10 consecutive epochs."
    validation: "Train-test split with early stopping"
    experiments_ID: [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 19, 20, 21, 22, 24, 25, 26, 27, 29, 30, 31, 33, 34, 36, 37, 38, 39, 40, 42, 43, 44, 45, 47, 48, 49, 50, 51, 53, 54, 55, 56, 57, 58, 59, 61, 62, 65, 66, 67, 69, 70, 71, 72, 73, 74, 75, 76, 77, 80, 81, 82, 83, 84, 85, 87, 88, 89, 90, 92, 93, 94, 95, 96, 97, 98, 99]

  inputs:
    scaler: "0009_[Python]_penicillin_LSTM_features_scaler.pkl"
    features:
      - name: "temperature"
        type: "sensor"
        description: "Current temperature (T) in the bioreactor."
        units: "K"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 298   # 25 °C
          max: 308   # 35 °C

      - name: "pH"
        type: "sensor"
        description: "Current pH level in the bioreactor."
        units: "pH"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 5.5
          max: 7.5

      - name: "dissolved_oxygen_concentration"
        type: "sensor"
        description: "Dissolved oxygen (DO) concentration in the bioreactor."
        units: "mg/L"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 0
          max: 10

      - name: "agitator"
        type: "actuator"
        description: "Agitation speed in revolutions per minute."
        units: "rpm"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 100
          max: 1200

      - name: "CO2_percent_in_off_gas"
        type: "sensor"
        description: "CO2 percentage in the off-gas (CO2,og)."
        units: "%"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 0
          max: 10

      - name: "oxygen_in_percent_in_off_gas"
        type: "sensor"
        description: "O2 percentage in the off-gas (O2,og)."
        units: "%"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 10
          max: 21

      - name: "vessel_volume"
        type: "computed_variable"
        description: "Total volume of the bioreactor vessel (V)."
        units: "L"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 1
          max: 1000

      - name: "sugar_feed_rate"
        type: "actuator"
        description: "Sugar feed rate (Fs) into the bioreactor."
        units: "L/h"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 0
          max: 2

  outputs:
    scaler: "0009_[Python]_penicillin_LSTM_target_scaler.pkl"
    information:
      - name: "penicillin_concentration"
        description: "Prediction of the penicillin concentration."
        units: "g L−1"
        forecast_horizon: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 0
          max: 50

How it plugs in

🖥️

Streamlit explorer UI

A dedicated Streamlit-based interface to visualize, explore, and manage every model stored in the registry — browseable from the same browser as the Dashboard.

🔌

REST endpoints

Every registered model is reachable over HTTP — the Orchestrator scores fresh snapshots through these endpoints, and external apps can integrate without any STAMM-specific glue.

🤝

Read by every module

The Dashboard, Drift Detectors, and Orchestrator all read the same metadata — so what you declare in the registry shows up consistently across the system.

🗂️ Model Registry

What the registry does​

What every model carries​

How it plugs in​

What the registry does

What every model carries

How it plugs in