🗂️ Model Registry

🗂️ Model Registry#

The Model Registry is the core component responsible for the storage, versioning, and lifecycle management of all deployed soft sensors within STAMM. It supports models developed in both Python and R, enabling smooth integration across diverse machine learning workflows. Each model is stored along with its configuration, artifacts, and validation results, providing a structured foundation for consistent and reliable deployment across industrial systems.

Beyond simple storage, the registry captures rich metadata models that fully describe each soft sensor from identification (name, version, author, and creation date) to technical specifications such as the learner type, model architecture, training parameters, and input–output variables. For instance, metadata may define an LSTM-based soft sensor trained with TensorFlow and Scikit-learn, specifying its layers, optimizer, loss function, and feature scaling. Below is an example of a metadata configuration file used by STAMM to describe a deployed soft sensor.
It defines the model’s identification, architecture, training information, and input/output variables.

📘 View example YAML

ml_model_configuration:
  model_identification:
    name: "LSTM"
    version: "V.1.0"
    ID: "0009_[Python]_penicillin_LSTM"
    UUID: ""
    author: "Suarez, C., Astudillo A., Metcalfe B., Koehorst J.J., Castillo E. & Corrales, D. C"  
    doi: ""
    creation_date: "17-02-2025"
    project: "../project_info.yaml"
    status: "online"
    status_description: "Model is loaded and ready for predictions."      

  model_description:
    learner: "Long short-term memory (LSTM)"
    model_type: "black box"   
    model_name: "LSTM without lag features [Python version]"
    description: "This is not the soft sensor using the LSTM with lag features proposed in the paper (Metcalfe et al., 2025)."
    language: 
      - name: "python"
        version: "3.12.7"
    config_files: 
      model_file: "0009_[Python]_penicillin_LSTM.keras"
    packages:
      - package: "tensorflow"
        version: "2.18.0"
        classes:
          - "keras.layers.LSTM"
          - "keras.layers.Dropout"
          - "keras.layers.BatchNormalization"
          - "keras.layers.Dense"
          - "keras.models.Sequential"
          - "keras.callbacks.EarlyStopping"
          - "keras.optimizers.Adam"
      - package: "sklearn"
        version: "1.5.2"
        classes:
          - "preprocessing.MinMaxScaler"                          

    input_time_interval: 
      time_interval:
        value: 12
        unit: "minutes"
      aggregation:
        method: "NaN"
        description: "NaN"            
      description: "One measurement every 12 minutes"            

  model_architecture:
    input_layer:
      description: "The input layer consists of time-series data with each sample having 1 time step and features corresponding to the process variables."
      shape: "(None, 1, 8)" 
    lstm:
      - layer: 1
        units: 256
        activation_function: "ReLU"
        description: "The first LSTM layer has 256 units with ReLU activation. This layer processes the time-series data and extracts temporal features."
      - layer: 2
        units: 256
        activation_function: "ReLU"      
        description: "The second LSTM layer has 256 units with ReLU activation. It refines the temporal features and passes them to the next layer."
    layers:        
      - layer: 3
        name: "dropout"
        rate: 0.2
        description: "A dropout layer with a rate of 0.2 is applied after the LSTM layers to mitigate overfitting."
      - layer: 4  
        name: "batch_normalization"
        description: "Batch normalization is applied to stabilize and accelerate the training process by normalizing the activations of the previous layers."
      - layer: 5  
        name: "dense"
        units: 256
        activation_function: "ReLU"        
        description: "The dense layer with 256 units and ReLU activation is used for further processing before the output layer."
    output_layer:
      activation_function: "ReLU"    
      description: "The output layer consists of a single neuron with with ReLU activation, predicting the penicillin concentration."
      shape: "(None, 1)"

  training_information:            
    number_of_instances: 89800  
    hyperparameters:
      batch_size: 512
      epochs: 100
      optimizer: 
        name: "Adam"
        learning_rate: 0.001
        description: "The Adam optimizer is used for training."
      loss_function: 
        name: "Mean Squared Error (MSE)"
        description: "MSE is used as the loss function to minimize the squared differences between predicted and actual values of penicillin concentration."
      callbacks:
        early_stopping: 
          monitor: "val_loss"
          patience: 10
          description: "Early stopping is used to halt training when the validation loss stops improving for 10 consecutive epochs."
    validation: "Train-test split with early stopping"          
    experiments_ID: [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 19, 20, 21, 22, 24, 25, 26, 27, 29, 30, 31, 33, 34, 36, 37, 38, 39, 40, 42, 43, 44, 45, 47, 48, 49, 50, 51, 53, 54, 55, 56, 57, 58, 59, 61, 62, 65, 66, 67, 69, 70, 71, 72, 73, 74, 75, 76, 77, 80, 81, 82, 83, 84, 85, 87, 88, 89, 90, 92, 93, 94, 95, 96, 97, 98, 99]

  inputs:
    scaler: "0009_[Python]_penicillin_LSTM_features_scaler.pkl"
    features:
      - name: "temperature"
        type: "sensor"
        description: "Current temperature (T) in the bioreactor."
        units: "K"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 298   # 25 °C
          max: 308   # 35 °C

      - name: "pH"
        type: "sensor"
        description: "Current pH level in the bioreactor."
        units: "pH"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 5.5
          max: 7.5

      - name: "dissolved_oxygen_concentration"
        type: "sensor"
        description: "Dissolved oxygen (DO) concentration in the bioreactor."
        units: "mg/L"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 0
          max: 10

      - name: "agitator"
        type: "actuator"
        description: "Agitation speed in revolutions per minute."
        units: "rpm"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 100
          max: 1200

      - name: "CO2_percent_in_off_gas"
        type: "sensor"
        description: "CO2 percentage in the off-gas (CO2,og)."
        units: "%"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 0
          max: 10

      - name: "oxygen_in_percent_in_off_gas"
        type: "sensor"
        description: "O2 percentage in the off-gas (O2,og)."
        units: "%"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 10
          max: 21

      - name: "vessel_volume"
        type: "computed_variable"
        description: "Total volume of the bioreactor vessel (V)."
        units: "L"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 1
          max: 1000  

      - name: "sugar_feed_rate"
        type: "actuator"
        description: "Sugar feed rate (Fs) into the bioreactor."
        units: "L/h"
        lag: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 0
          max: 2

  outputs:
    scaler: "0009_[Python]_penicillin_LSTM_target_scaler.pkl"
    information:
      - name: "penicillin_concentration"
        description: "Prediction of the penicillin concentration."
        units: "g L−1"
        forecast_horizon: 0
        feature_scaling: "Min-Max normalization"
        expected_range:
          min: 0
          max: 50

The Model Registry also includes a dedicated Streamlit-based interface that allows users to easily visualize, explore, and manage all models stored within the registry. Each registered model is exposed as a REST API endpoint, enabling seamless integration with other STAMM modules, external applications, and automated orchestration workflows.

Explore the source code and contribute on GitLab: