ML/AI Best Practices: “Selecting Surrogate Model Form/Size for Optimization”

ML/AI Best Practices: “Selecting Surrogate Model Form/Size for Optimization”#

Maintainer: Brandon Paul
Author: Brandon Paul
Updated: 2023-06-01

In this notebook we demonstrate the use of model and solver statistics to select the best surrogate model. For this purpose we trained (offline) different models with ALAMO, PySMO for three basis forms, and TensorFlow Keras. The surrogates are imported into the notebook, and the IDAES flowsheet is constructed and solved.

1. Introduction#

This example demonstrates autothermal reformer optimization leveraging the ALAMO, PySMO and Keras surrogate trainers, and compares key indicators of model performance. In this notebook, IPOPT will be run with statistics using ALAMO, PySMO Polynomial, PySMO RBF, PySMO Kriging and Keras surrogate models to assess each model type for flowsheet integration and tractability.

2. Problem Statement#

Within the context of a larger Natural Gas Fuel Cell (NGFC) system, the autothermal reformer unit is used to generate syngas from air, steam, and natural gas. Two input variables are considered for this example (reformer bypass fraction and fuel to steam ratio). The reformer bypass fraction (also called internal reformation percentage) plays an important role in the syngas final composition and it is typically controlled in this process. The fuel to steam ratio is an important variable that affects the final syngas reaction and heat duty required by the reactor. The syngas is then used as fuel by a solid-oxide fuel cell (SOFC) to generate electricity and heat.

The autothermal reformer is typically modeled using the IDAES Gibbs reactor and this reactor is robust once it is initialized; however, the overall model robustness is affected due to several components present in the reaction, scaling issues for the largrangean multipliers, and Gibbs free energy minimization formulation. Substituting rigorously trained and validated surrogates in lieu of rigorous unit model equations increases the robustness of the problem.

2.1. Inputs:#

Bypass fraction (dimensionless) - split fraction of natural gas to bypass AR unit and feed directly to the power island
NG-Steam Ratio (dimensionless) - proportion of natural relative to steam fed into AR unit operation

2.2. Outputs:#

Steam flowrate (kg/s) - inlet steam fed to AR unit
Reformer duty (kW) - required energy input to AR unit
Composition (dimensionless) - outlet mole fractions of components (Ar, C2H6, C3H8, C4H10, CH4, CO, CO2, H2, H2O, N2, O2)

from IPython.display import Image
from pathlib import Path

def datafile_path(name):
    return Path(".") / name

Image(datafile_path("AR_PFD.png"))

../../_images/1a3b4ae911ad91d018a0eda19303684d24f174506309b4831ccbdaefc6dc984b.png

3. Training Surrogates#

Previous Jupyter Notebooks demonstrated the workflow to import data, train surrogate models using ALAMO, PySMO and Keras, and develop IDAES’s validation plots. To keep this notebook simple, this notebook simply loads the surrogate models trained off line.

Note that the training/loading method includes a “retrain” argument in case the user wants to retrain all surrogate models. Since the retrain method runs ALAMO, PySMO (Polynomial, Radial Basis Functions, and Kriging basis types) and Keras, it takes about an 1 hr to complete the training for all models.

Each run will overwrite the serialized JSON files for previously trained surrogates if retraining is enforced. To retrain individual surrogates, simply delete the desired JSON before running this notebook (for Keras, delete the folder keras_surrogate/)

from idaes_examples.mod.surrogates.AR_training_methods import (
    train_load_surrogates,
    SurrType,
)

trained_surr = train_load_surrogates(retrain=False)
# setting retrain to True will take ~ 1 hour to run, best to load if possible
# setting retrain to False will only generate missing surrogates (only if JSON/folder doesn't exist)
# this method trains surrogates and serializes to JSON
# The return value is a set of surrogate types (instances of SurrType) that were trained

# imports to capture long output
from io import StringIO
import sys

2024-05-09 18:35:06.820728: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-09 18:35:06.870695: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-09 18:35:06.871660: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

2024-05-09 18:35:07.797537: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

WARNING: DEPRECATED: Declaring class 'OmltBlockData' derived from
'_BlockData'. The class '_BlockData' has been renamed to 'BlockData'.
(deprecated in 6.7.2.dev0) (called from
/home/docs/checkouts/readthedocs.org/user_builds/idaes-
examples/envs/latest/lib/python3.8/site-packages/omlt/block.py:33)

Loading existing surrogate models and training missing models.
Any training output will print below; otherwise, models will be loaded without any further output.
Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [0.430492 0.210838] 
Mean: [[0.690696]] 
Regularization parameter: 1.000000000001e-06

Results saved in  solution.pickle
2024-05-09 18:35:13 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output Steam_Flow trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [2.530486 0.042161] 
Mean: [[24493.925303]] 
Regularization parameter: 1.000000000001e-06

Results saved in  solution.pickle
2024-05-09 18:35:15 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output Reformer_Duty trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [5.666813 0.019873] 
Mean: [[0.003363]] 
Regularization parameter: 1.000000000001e-06

Results saved in  solution.pickle
2024-05-09 18:35:15 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output AR trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [5.721231 0.130094] 
Mean: [[0.007353]] 
Regularization parameter: 1e-06

Results saved in  solution.pickle
2024-05-09 18:35:17 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output C2H6 trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [8.928968 0.002469] 
Mean: [[0.001579]] 
Regularization parameter: 1.000000000001e-06

Results saved in  solution.pickle
2024-05-09 18:35:18 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output C3H8 trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [10.418593  0.005828] 
Mean: [[0.000834]] 
Regularization parameter: 1.000000000001e-06

Results saved in  solution.pickle
2024-05-09 18:35:20 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output C4H10 trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [10.796722  0.003864] 
Mean: [[0.211277]] 
Regularization parameter: 1.0000000001467169e-06

Results saved in  solution.pickle
2024-05-09 18:35:22 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output CH4 trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [4.782906 0.112192] 
Mean: [[0.087155]] 
Regularization parameter: 1.000000000001e-06

Results saved in  solution.pickle
2024-05-09 18:35:23 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output CO trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [3.598394 0.23176 ] 
Mean: [[0.037778]] 
Regularization parameter: 1e-06

Results saved in  solution.pickle
2024-05-09 18:35:25 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output CO2 trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [6.876494 0.006946] 
Mean: [[0.231256]] 
Regularization parameter: 1.0000000002924336e-06

Results saved in  solution.pickle
2024-05-09 18:35:27 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output H2 trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [0.759585 0.236697] 
Mean: [[0.056419]] 
Regularization parameter: 1.000000000001e-06

Results saved in  solution.pickle
2024-05-09 18:35:28 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output H2O trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [5.859358 0.010898] 
Mean: [[0.292405]] 
Regularization parameter: 1.000000000001e-06

Results saved in  solution.pickle
2024-05-09 18:35:29 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output N2 trained successfully

Warning: solution.pickle already exists; previous file will be overwritten.

Optimizing kriging parameters using L-BFGS-B algorithm...

Final results
================
Theta: [0.911295 0.23918 ] 
Mean: [[0.]] 
Regularization parameter: 2.80099773484338e-06

Results saved in  solution.pickle
2024-05-09 18:35:31 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output O2 trained successfully

4. Build and Run IDAES Flowsheet#

This step builds an IDAES flowsheet and imports the surrogate model objects. As shown in the prior three examples, a single model object accounts for all input and output variables, and the JSON model serialized earlier may be imported into a single SurrogateBlock() component. While the serialization method and file structure differs slightly between the ALAMO, PySMO and Keras Python Wrappers, the three are imported similarly into IDAES flowsheets as shown below.

4.1 Build IDAES Flowsheet#

This method builds an instance of the IDAES flowsheet model and solves the flowsheet using IPOPT. The method allows users to select a case and the surrogate model type to be used (i.e., alamo, pysmo, keras). The case argument consists of a list with values for the input variables (in this order, bypass split fraction and natural gas to steam ratio). Then the method fixes the input variables values to solve a square problem with IPOPT.

# Import IDAES and Pyomo libraries
from pyomo.environ import ConcreteModel, SolverFactory, value, Var, Constraint, Set
from idaes.core.surrogate.surrogate_block import SurrogateBlock
from idaes.core.surrogate.alamopy import AlamoSurrogate
from idaes.core.surrogate.pysmo_surrogate import PysmoSurrogate
from idaes.core.surrogate.keras_surrogate import KerasSurrogate
from idaes.core import FlowsheetBlock


def build_flowsheet(case, surrogate_type: SurrType = None):
    print(case, " ", surrogate_type.value)
    # create the IDAES model and flowsheet
    m = ConcreteModel()
    m.fs = FlowsheetBlock(dynamic=False)

    # create flowsheet input variables
    m.fs.bypass_frac = Var(
        initialize=0.80, bounds=[0.1, 0.8], doc="natural gas bypass fraction"
    )
    m.fs.ng_steam_ratio = Var(
        initialize=0.80, bounds=[0.8, 1.2], doc="natural gas to steam ratio"
    )

    # create flowsheet output variables
    m.fs.steam_flowrate = Var(initialize=0.2, doc="steam flowrate")
    m.fs.reformer_duty = Var(initialize=10000, doc="reformer heat duty")
    m.fs.AR = Var(initialize=0, doc="AR fraction")
    m.fs.C2H6 = Var(initialize=0, doc="C2H6 fraction")
    m.fs.C3H8 = Var(initialize=0, doc="C3H8 fraction")
    m.fs.C4H10 = Var(initialize=0, doc="C4H10 fraction")
    m.fs.CH4 = Var(initialize=0, doc="CH4 fraction")
    m.fs.CO = Var(initialize=0, doc="CO fraction")
    m.fs.CO2 = Var(initialize=0, doc="CO2 fraction")
    m.fs.H2 = Var(initialize=0, doc="H2 fraction")
    m.fs.H2O = Var(initialize=0, doc="H2O fraction")
    m.fs.N2 = Var(initialize=0, doc="N2 fraction")
    m.fs.O2 = Var(initialize=0, doc="O2 fraction")

    # create input and output variable object lists for flowsheet
    inputs = [m.fs.bypass_frac, m.fs.ng_steam_ratio]
    outputs = [
        m.fs.steam_flowrate,
        m.fs.reformer_duty,
        m.fs.AR,
        m.fs.C2H6,
        m.fs.C3H8,
        m.fs.C4H10,
        m.fs.CH4,
        m.fs.CO,
        m.fs.CO2,
        m.fs.H2,
        m.fs.H2O,
        m.fs.N2,
        m.fs.O2,
    ]

    # create the Pyomo/IDAES block that corresponds to the surrogate
    # call correct PySMO object to use below (will let us avoid nested switches)

    # capture long output from loading surrogates (don't need to print it)
    stream = StringIO()
    oldstdout = sys.stdout
    sys.stdout = stream

    if surrogate_type == SurrType.ALAMO:
        surrogate = AlamoSurrogate.load_from_file("alamo_surrogate.json")
        m.fs.surrogate = SurrogateBlock()
        m.fs.surrogate.build_model(surrogate, input_vars=inputs, output_vars=outputs)
    elif surrogate_type == SurrType.KERAS:
        keras_surrogate = KerasSurrogate.load_from_folder("keras_surrogate")
        m.fs.surrogate = SurrogateBlock()
        m.fs.surrogate.build_model(
            keras_surrogate,
            formulation=KerasSurrogate.Formulation.FULL_SPACE,
            input_vars=inputs,
            output_vars=outputs,
        )
    elif SurrType.is_pysmo(
        surrogate_type
    ):  # surrogate is one of the three pysmo basis options
        surrogate = PysmoSurrogate.load_from_file(
            surrogate_type.value + "_surrogate.json"
        )
        m.fs.surrogate = SurrogateBlock()
        m.fs.surrogate.build_model(surrogate, input_vars=inputs, output_vars=outputs)
    else:
        raise ValueError(f"Unknown surrogate type: {surrogate_type}")

    # revert to standard output
    sys.stdout = oldstdout

    # fix input values and solve flowsheet
    m.fs.bypass_frac.fix(case[0])
    m.fs.ng_steam_ratio.fix(case[1])

    solver = SolverFactory("ipopt")
    try:  # attempt to solve problem
        results = solver.solve(m, tee=True)
    except:  # retry solving one more time
        results = solver.solve(m, tee=True)

    return (
        value(m.fs.steam_flowrate),
        value(m.fs.reformer_duty),
        value(m.fs.C2H6),
        value(m.fs.CH4),
        value(m.fs.H2),
        value(m.fs.O2),
    )

4.2 Model Size/Form Comparison#

As mentioned above, as part of best practices the IDAES ML/AI demonstration includes the analysis of model/solver statistics and performance to determine the best surrogate model, including model size, model form, model trainer, etc. This section provides the rigorous analysis of solver performance comparing different surrogate models (ALAMO, PySMO polynomial, PysMO RBF, and PySMO Kriging).

To obtain the results, we run the flowsheet for ten different simulation cases for each surrogate model type. Since the simulation cases are obtained from the training data set we can compare model performance (absolute error of measurement vs predicted output values).

# Import Auto-reformer training data
import numpy as np
import pandas as pd

np.set_printoptions(precision=6, suppress=True)
csv_data = pd.read_csv(r"reformer-data.csv")  # 2800 data points

# extracting 10 data points out of 2800 data points, randomly selecting 10 cases to run
case_data = csv_data.sample(n=10)

# selecting columns that correspond to Input Variables
inputs = np.array(case_data.iloc[:, :2])

# selecting columns that correspond to Output Variables
cols = ["Steam_Flow", "Reformer_Duty", "C2H6", "CH4", "H2", "O2"]
outputs = np.array(case_data[cols])

# For results comparison with minimum memory usage we will extract the values to plot on each pass
# note that the entire model could be returned and saved on each loop if desired

# create empty dictionaries so we may easily index results as we save them
# for convenience while plotting, each output variable has its own dictionary
# indexed by (case number, trainer type)
# trainers = ["alamo", "pysmo_poly", "pysmo_rbf", "pysmo_krig", "keras"]
# temporarily remove keras
trainers = list(trained_surr - {SurrType.KERAS})

cases = range(len(inputs))
steam_flow_error = {}
reformer_duty_error = {}
conc_C2H6 = {}
conc_CH4 = {}
conc_H2 = {}
conc_O2 = {}

# run flowsheet for each trainer and save results
i = 0
for case in inputs:  # each case is a value pair (bypass_frac, ng_steam_ratio)
    i = i + 1
    for trainer in trainers:
        [sf, rd, eth, meth, hyd, oxy,] = build_flowsheet(
            case, surrogate_type=trainer
        )
        steam_flow_error[(i, trainer)] = abs(
            (sf - value(outputs[i - 1, 0])) / value(outputs[i - 1, 0])
        )
        reformer_duty_error[(i, trainer)] = abs(
            (rd - value(outputs[i - 1, 1])) / value(outputs[i - 1, 1])
        )
        conc_C2H6[(i, trainer)] = abs(eth - value(outputs[i - 1, 2]))
        conc_CH4[(i, trainer)] = abs(meth - value(outputs[i - 1, 3]))
        conc_H2[(i, trainer)] = abs(hyd - value(outputs[i - 1, 4]))
        conc_O2[(i, trainer)] = abs(oxy - value(outputs[i - 1, 5]))

[0.221739 1.021053]   pysmo_kriging
2024-05-09 18:35:32 [INFO] idaes.core.surrogate.pysmo_surrogate: Decode surrogate. type=kriging

WARNING: Could not locate the 'ipopt' executable, which is required for solver
ipopt

WARNING: Could not locate the 'ipopt' executable, which is required for solver
ipopt

---------------------------------------------------------------------------
ApplicationError                          Traceback (most recent call last)
Cell In[4], line 98, in build_flowsheet(case, surrogate_type)
     97 try:  # attempt to solve problem
---> 98     results = solver.solve(m, tee=True)
     99 except:  # retry solving one more time

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/pyomo/opt/base/solvers.py:534, in OptSolver.solve(self, *args, **kwds)
    532 """Solve the problem"""
--> 534 self.available(exception_flag=True)
    535 #
    536 # If the inputs are models, then validate that they have been
    537 # constructed! Collect suffix names to try and import from solution.
    538 #

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/pyomo/opt/solver/shellcmd.py:140, in SystemCallSolver.available(self, exception_flag)
    139     msg = "No executable found for solver '%s'"
--> 140     raise ApplicationError(msg % self.name)
    141 return False

ApplicationError: No executable found for solver 'ipopt'

During handling of the above exception, another exception occurred:

ApplicationError                          Traceback (most recent call last)
Cell In[5], line 41
     39 i = i + 1
     40 for trainer in trainers:
---> 41     [sf, rd, eth, meth, hyd, oxy,] = build_flowsheet(
     42         case, surrogate_type=trainer
     43     )
     44     steam_flow_error[(i, trainer)] = abs(
     45         (sf - value(outputs[i - 1, 0])) / value(outputs[i - 1, 0])
     46     )
     47     reformer_duty_error[(i, trainer)] = abs(
     48         (rd - value(outputs[i - 1, 1])) / value(outputs[i - 1, 1])
     49     )

Cell In[4], line 100, in build_flowsheet(case, surrogate_type)
     98     results = solver.solve(m, tee=True)
     99 except:  # retry solving one more time
--> 100     results = solver.solve(m, tee=True)
    102 return (
    103     value(m.fs.steam_flowrate),
    104     value(m.fs.reformer_duty),
   (...)
    108     value(m.fs.O2),
    109 )

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/pyomo/opt/base/solvers.py:534, in OptSolver.solve(self, *args, **kwds)
    531 def solve(self, *args, **kwds):
    532     """Solve the problem"""
--> 534     self.available(exception_flag=True)
    535     #
    536     # If the inputs are models, then validate that they have been
    537     # constructed! Collect suffix names to try and import from solution.
    538     #
    539     from pyomo.core.base.block import BlockData

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/pyomo/opt/solver/shellcmd.py:140, in SystemCallSolver.available(self, exception_flag)
    138     if exception_flag:
    139         msg = "No executable found for solver '%s'"
--> 140         raise ApplicationError(msg % self.name)
    141     return False
    142 return True

ApplicationError: No executable found for solver 'ipopt'

We can visualize these results by plotting a graph for each of the quantities above, creating a data series for each surrogate trainer. Some data series may overlay if values are identical for all cases:

from matplotlib import pyplot as plt

# create figure/axes for each plot sequentially, plotting each trainer as a separate data series

# Steam Flow Prediction
fig = plt.figure()
ax = fig.add_subplot()
for trainer in trainers:
    # pick out the points that use that trainer and plot them against case number
    sf = [steam_flow_error[(i, j)] for (i, j) in steam_flow_error if j == trainer]
    ax.plot(cases, sf, label=trainer)
# add info to plot
ax.set_xlabel("Cases")
ax.set_ylabel("Relative Error")
ax.set_title("Steam Flow Prediction")
ax.legend()
plt.yscale("log")
plt.show()

# Reformer Duty Prediction
fig = plt.figure()
ax = fig.add_subplot()
for trainer in trainers:
    # pick out the points that use that trainer and plot them against case number
    rd = [reformer_duty_error[(i, j)] for (i, j) in reformer_duty_error if j == trainer]
    ax.plot(cases, rd, label=trainer)
# add info to plot
ax.set_xlabel("Cases")
ax.set_ylabel("Relative Error")
ax.set_title("Reformer Duty Prediction")
ax.legend()
plt.yscale("log")
plt.show()

# C2H6 Mole Fraction Prediction
fig = plt.figure()
ax = fig.add_subplot()
for trainer in trainers:
    # pick out the points that use that trainer and plot them against case number
    eth = [conc_C2H6[(i, j)] for (i, j) in conc_C2H6 if j == trainer]
    ax.plot(cases, eth, label=trainer)
# add info to plot
ax.set_xlabel("Cases")
ax.set_ylabel("Absolute Error")
ax.set_title("C2H6 Mole Fraction Prediction (O(1E-2))")
ax.legend()
plt.yscale("log")
plt.show()

print()
print("Mole fraction predictions displayed with absolute error:")
print()

# CH4 Mole Fraction Prediction
fig = plt.figure()
ax = fig.add_subplot()
for trainer in trainers:
    # pick out the points that use that trainer and plot them against case number
    meth = [conc_CH4[(i, j)] for (i, j) in conc_CH4 if j == trainer]
    ax.plot(cases, meth, label=trainer)
# add info to plot
ax.set_xlabel("Cases")
ax.set_ylabel("Absolute Error")
ax.set_title("CH4 Mole Fraction Prediction (O(1E-1))")
ax.legend()
plt.yscale("log")
plt.show()

# H2 Mole Fraction Prediction
fig = plt.figure()
ax = fig.add_subplot()
for trainer in trainers:
    # pick out the points that use that trainer and plot them against case number
    hyd = [conc_H2[(i, j)] for (i, j) in conc_H2 if j == trainer]
    ax.plot(cases, hyd, label=trainer)
# add info to plot
ax.set_xlabel("Cases")
ax.set_ylabel("Absolute Error")
ax.set_title("H2 Mole Fraction Prediction (O(1E-1))")
ax.legend()
plt.yscale("log")
plt.show()

# O2 Mole Fraction Prediction
fig = plt.figure()
ax = fig.add_subplot()
for trainer in trainers:
    # pick out the points that use that trainer and plot them against case number
    oxy = [conc_O2[(i, j)] for (i, j) in conc_O2 if j == trainer]
    ax.plot(cases, oxy, label=trainer)
# add info to plot
ax.set_xlabel("Cases")
ax.set_ylabel("Absolute Error")
ax.set_title("O2 Mole Fraction Prediction (O(1E-20))")
ax.legend()
plt.yscale("log")
plt.show()

../../_images/72d672c14c987ed755a196f1200ab4c01b0df609b7e9d1f636960c5eb7b4301b.png

../../_images/dfa91276a31c7ce0d5c8be6be1f4703b0a91435204f264b2cff7c5103249bad2.png

../../_images/74732587fb293eb49a08f5f90e071a7f1d4afeacf776bbfbfda7a522907fc0f5.png

Mole fraction predictions displayed with absolute error:

../../_images/5d170807ee884f782ab5b24f45246a0752e8ad10e5b8c0dce3617dd1f59a6e9c.png

../../_images/3c12b2243aad29bd2b309df91798606059baed5bcc6c970c2784cb172d130418.png

../../_images/2f3642ce16eb11bd21ce0c2070b524b5fab8ed7bf1e8a652b79cf5afcf496f62.png

4.3 Comparing Surrogate Optimization#

Extending this analysis, we will run a single optimization scenario for each surrogate model and compare results. As in previous examples detailing workflows for ALAMO, PySMO and Keras, we will optimize hydrogen production while restricting nitrogen below 34 mol% in the product stream.

# Import additional Pyomo libraries
from pyomo.environ import Objective, maximize


def run_optimization(surrogate_type=None):
    print(surrogate_type)
    # create the IDAES model and flowsheet
    m = ConcreteModel()
    m.fs = FlowsheetBlock(dynamic=False)

    # create flowsheet input variables
    m.fs.bypass_frac = Var(
        initialize=0.80, bounds=[0.1, 0.8], doc="natural gas bypass fraction"
    )
    m.fs.ng_steam_ratio = Var(
        initialize=0.80, bounds=[0.8, 1.2], doc="natural gas to steam ratio"
    )

    # create flowsheet output variables
    m.fs.steam_flowrate = Var(initialize=0.2, doc="steam flowrate")
    m.fs.reformer_duty = Var(initialize=10000, doc="reformer heat duty")
    m.fs.AR = Var(initialize=0, doc="AR fraction")
    m.fs.C2H6 = Var(initialize=0, doc="C2H6 fraction")
    m.fs.C3H8 = Var(initialize=0, doc="C3H8 fraction")
    m.fs.C4H10 = Var(initialize=0, doc="C4H10 fraction")
    m.fs.CH4 = Var(initialize=0, doc="CH4 fraction")
    m.fs.CO = Var(initialize=0, doc="CO fraction")
    m.fs.CO2 = Var(initialize=0, doc="CO2 fraction")
    m.fs.H2 = Var(initialize=0, doc="H2 fraction")
    m.fs.H2O = Var(initialize=0, doc="H2O fraction")
    m.fs.N2 = Var(initialize=0, doc="N2 fraction")
    m.fs.O2 = Var(initialize=0, doc="O2 fraction")

    # create input and output variable object lists for flowsheet
    inputs = [m.fs.bypass_frac, m.fs.ng_steam_ratio]
    outputs = [
        m.fs.steam_flowrate,
        m.fs.reformer_duty,
        m.fs.AR,
        m.fs.C2H6,
        m.fs.C3H8,
        m.fs.C4H10,
        m.fs.CH4,
        m.fs.CO,
        m.fs.CO2,
        m.fs.H2,
        m.fs.H2O,
        m.fs.N2,
        m.fs.O2,
    ]

    # create the Pyomo/IDAES block that corresponds to the surrogate
    # call correct PySMO object to use below (will let us avoid nested switches)

    # capture long output from loading surrogates (don't need to print it)
    stream = StringIO()
    oldstdout = sys.stdout
    sys.stdout = stream

    if surrogate_type == SurrType.ALAMO:
        surrogate = AlamoSurrogate.load_from_file("alamo_surrogate.json")
        m.fs.surrogate = SurrogateBlock()
        m.fs.surrogate.build_model(surrogate, input_vars=inputs, output_vars=outputs)
    elif surrogate_type == SurrType.KERAS:
        keras_surrogate = KerasSurrogate.load_from_folder("keras_surrogate")
        m.fs.surrogate = SurrogateBlock()
        m.fs.surrogate.build_model(
            keras_surrogate,
            formulation=KerasSurrogate.Formulation.FULL_SPACE,
            input_vars=inputs,
            output_vars=outputs,
        )
    elif SurrType.is_pysmo(
        surrogate_type
    ):  # surrogate is one of the three pysmo basis options
        surrogate = PysmoSurrogate.load_from_file(
            surrogate_type.value + "_surrogate.json"
        )
        m.fs.surrogate = SurrogateBlock()
        m.fs.surrogate.build_model(surrogate, input_vars=inputs, output_vars=outputs)
    else:
        raise ValueError(f"Unknown surrogate type: {surrogate_type}")

    # revert to standard output
    sys.stdout = oldstdout

    # unfix input values and add the objective/constraint to the model
    m.fs.bypass_frac.unfix()
    m.fs.ng_steam_ratio.unfix()
    m.fs.obj = Objective(expr=m.fs.H2, sense=maximize)
    m.fs.con = Constraint(expr=m.fs.N2 <= 0.34)

    solver = SolverFactory("ipopt")
    try:  # attempt to solve problem
        results = solver.solve(m, tee=True)
    except:  # retry solving one more time
        results = solver.solve(m, tee=True)

    return inputs, outputs

# create list objects to store data, run optimization
results = {}
for trainer in trainers:
    inputs, outputs = run_optimization(trainer)
    for var in inputs:
        results[(var.name, trainer)] = value(var)
    for var in outputs:
        results[(var.name, trainer)] = value(var)

SurrType.PYSMO_KRG
2023-11-02 10:25:04 [INFO] idaes.core.surrogate.pysmo_surrogate: Decode surrogate. type=kriging

Ipopt 3.13.2: 

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt

This version of Ipopt was compiled from source code available at
    https://github.com/IDAES/Ipopt as part of the Institute for the Design of
    Advanced Energy Systems Process Systems Engineering Framework (IDAES PSE
    Framework) Copyright (c) 2018-2019. See https://github.com/IDAES/idaes-pse.

This version of Ipopt was compiled using HSL, a collection of Fortran codes
    for large-scale scientific computation.  All technical papers, sales and
    publicity material resulting from use of the HSL codes within IPOPT must
    contain the following acknowledgement:
        HSL, a collection of Fortran codes for large-scale scientific
        computation. See http://www.hsl.rl.ac.uk.
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver ma27.

Number of nonzeros in equality constraint Jacobian...:       39
Number of nonzeros in inequality constraint Jacobian.:        1
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:       15
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        2
                     variables with only upper bounds:        0
Total number of equality constraints.................:       13
Total number of inequality constraints...............:        1
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        1

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 -0.0000000e+00 4.20e+01 2.20e-02  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1 -2.2210575e-01 8.98e+00 1.39e-02  -1.7 6.93e+02    -  9.41e-01 1.00e+00h  1
   2 -3.1543427e-01 4.63e+02 3.09e-02  -2.5 6.13e+03    -  9.54e-01 1.00e+00h  1
   3 -3.2648096e-01 1.67e+03 1.99e-02  -2.5 1.85e+04    -  5.27e-01 4.45e-01h  1
   4 -3.2734223e-01 2.21e+02 6.68e-03  -2.5 9.12e+03    -  1.00e+00 1.00e+00h  1
   5 -3.2603609e-01 3.46e+01 3.14e-04  -2.5 2.48e+01    -  1.00e+00 1.00e+00h  1
   6 -3.2609557e-01 2.53e-01 3.18e-06  -2.5 1.40e+02    -  1.00e+00 1.00e+00h  1
   7 -3.3096545e-01 6.96e+01 4.78e-03  -3.8 3.81e+03    -  9.20e-01 1.00e+00h  1
   8 -3.3154563e-01 3.39e+00 3.75e-04  -3.8 1.11e+03    -  1.00e+00 1.00e+00h  1
   9 -3.3138393e-01 5.32e-01 1.27e-05  -3.8 3.19e+02    -  1.00e+00 1.00e+00f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10 -3.3139248e-01 4.61e-04 6.78e-08  -3.8 1.24e+01    -  1.00e+00 1.00e+00h  1
  11 -3.3167648e-01 1.35e-01 7.35e-04  -5.7 1.71e+02    -  1.00e+00 9.75e-01h  1
  12 -3.3168623e-01 7.84e-04 7.02e-09  -5.7 1.52e+01    -  1.00e+00 1.00e+00h  1
  13 -3.3168990e-01 3.58e-05 2.76e-09  -8.6 1.64e+00    -  1.00e+00 1.00e+00h  1
  14 -3.3168990e-01 4.38e-09 2.72e-13  -8.6 2.99e-03    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 14

                                   (scaled)                 (unscaled)
Objective...............:  -3.3168990316632091e-01   -3.3168990316632091e-01
Dual infeasibility......:   2.7227776253254969e-13    2.7227776253254969e-13
Constraint violation....:   1.5548231598831156e-11    4.3801264837384224e-09
Complementarity.........:   2.5059042947426033e-09    2.5059042947426033e-09
Overall NLP error.......:   2.5059042947426033e-09    4.3801264837384224e-09


Number of objective function evaluations             = 15
Number of objective gradient evaluations             = 15
Number of equality constraint evaluations            = 15
Number of inequality constraint evaluations          = 15
Number of equality constraint Jacobian evaluations   = 15
Number of inequality constraint Jacobian evaluations = 15
Number of Lagrangian Hessian evaluations             = 14
Total CPU secs in IPOPT (w/o function evaluations)   =      0.002
Total CPU secs in NLP function evaluations           =      0.001

EXIT: Optimal Solution Found.


SurrType.PYSMO_RBF
2023-11-02 10:25:05 [INFO] idaes.core.surrogate.pysmo_surrogate: Decode surrogate. type=rbf

Ipopt 3.13.2: 

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt

This version of Ipopt was compiled from source code available at
    https://github.com/IDAES/Ipopt as part of the Institute for the Design of
    Advanced Energy Systems Process Systems Engineering Framework (IDAES PSE
    Framework) Copyright (c) 2018-2019. See https://github.com/IDAES/idaes-pse.

This version of Ipopt was compiled using HSL, a collection of Fortran codes
    for large-scale scientific computation.  All technical papers, sales and
    publicity material resulting from use of the HSL codes within IPOPT must
    contain the following acknowledgement:
        HSL, a collection of Fortran codes for large-scale scientific
        computation. See http://www.hsl.rl.ac.uk.
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver ma27.

Number of nonzeros in equality constraint Jacobian...:       39
Number of nonzeros in inequality constraint Jacobian.:        1
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:       15
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        2
                     variables with only upper bounds:        0
Total number of equality constraints.................:       13
Total number of inequality constraints...............:        1
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        1

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 -0.0000000e+00 4.58e+01 2.28e-02  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1 -2.2187269e-01 1.44e+01 1.33e-02  -1.7 6.80e+02    -  9.44e-01 1.00e+00h  1
   2 -3.1833654e-01 6.49e+02 1.65e-02  -2.5 6.42e+03    -  9.23e-01 1.00e+00h  1
   3 -3.2904216e-01 1.98e+03 3.27e-03  -2.5 1.61e+04    -  6.11e-01 5.97e-01h  1
   4 -3.2728772e-01 2.07e+02 3.54e-03  -2.5 8.41e+03    -  1.00e+00 1.00e+00h  1
   5 -3.2617335e-01 1.27e+01 3.00e-04  -2.5 7.83e+02    -  1.00e+00 1.00e+00h  1
   6 -3.2612431e-01 5.10e-03 1.28e-06  -2.5 4.03e+01    -  1.00e+00 1.00e+00h  1
   7 -3.3094622e-01 5.19e+01 4.80e-03  -3.8 3.78e+03    -  9.21e-01 1.00e+00h  1
   8 -3.3154557e-01 3.78e+00 1.51e-03  -3.8 1.17e+03    -  1.00e+00 9.89e-01h  1
   9 -3.3138015e-01 4.26e-01 1.61e-05  -3.8 3.24e+02    -  1.00e+00 1.00e+00f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10 -3.3138995e-01 5.21e-04 8.48e-08  -3.8 1.39e+01    -  1.00e+00 1.00e+00h  1
  11 -3.3167385e-01 1.35e-01 8.06e-04  -5.7 1.76e+02    -  1.00e+00 9.73e-01h  1
  12 -3.3168390e-01 1.87e-04 4.24e-08  -5.7 1.52e+01    -  1.00e+00 1.00e+00h  1
  13 -3.3168758e-01 3.70e-05 1.36e-09  -8.6 1.69e+00    -  1.00e+00 1.00e+00h  1
  14 -3.3168758e-01 8.00e-10 5.07e-14  -8.6 3.09e-03    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 14

                                   (scaled)                 (unscaled)
Objective...............:  -3.3168758389150360e-01   -3.3168758389150360e-01
Dual infeasibility......:   5.0701311326148040e-14    5.0701311326148040e-14
Constraint violation....:   2.9648061330219216e-12    8.0035533756017685e-10
Complementarity.........:   2.5059042476427779e-09    2.5059042476427779e-09
Overall NLP error.......:   2.5059042476427779e-09    2.5059042476427779e-09


Number of objective function evaluations             = 15
Number of objective gradient evaluations             = 15
Number of equality constraint evaluations            = 15
Number of inequality constraint evaluations          = 15
Number of equality constraint Jacobian evaluations   = 15
Number of inequality constraint Jacobian evaluations = 15
Number of Lagrangian Hessian evaluations             = 14
Total CPU secs in IPOPT (w/o function evaluations)   =      0.004
Total CPU secs in NLP function evaluations           =      0.002

EXIT: Optimal Solution Found.


SurrType.PYSMO_PLY
2023-11-02 10:25:05 [INFO] idaes.core.surrogate.pysmo_surrogate: Decode surrogate. type=poly

Ipopt 3.13.2: 

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt

This version of Ipopt was compiled from source code available at
    https://github.com/IDAES/Ipopt as part of the Institute for the Design of
    Advanced Energy Systems Process Systems Engineering Framework (IDAES PSE
    Framework) Copyright (c) 2018-2019. See https://github.com/IDAES/idaes-pse.

This version of Ipopt was compiled using HSL, a collection of Fortran codes
    for large-scale scientific computation.  All technical papers, sales and
    publicity material resulting from use of the HSL codes within IPOPT must
    contain the following acknowledgement:
        HSL, a collection of Fortran codes for large-scale scientific
        computation. See http://www.hsl.rl.ac.uk.
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver ma27.

Number of nonzeros in equality constraint Jacobian...:       39
Number of nonzeros in inequality constraint Jacobian.:        1
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:       15
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        2
                     variables with only upper bounds:        0
Total number of equality constraints.................:       13
Total number of inequality constraints...............:        1
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        1

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 -0.0000000e+00 5.35e+01 3.24e-02  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1 -2.2227750e-01 1.30e+01 1.45e-02  -1.7 6.89e+02    -  9.38e-01 1.00e+00h  1
   2 -3.1492876e-01 5.64e+02 3.41e-02  -2.5 6.01e+03    -  9.71e-01 1.00e+00h  1
   3 -3.2714017e-01 2.01e+03 1.99e-02  -2.5 1.39e+04    -  6.00e-01 7.01e-01h  1
   4 -3.2662426e-01 2.91e+02 5.57e-03  -2.5 7.48e+03    -  1.00e+00 1.00e+00h  1
   5 -3.2618524e-01 1.47e+01 1.24e-04  -2.5 4.56e+02    -  1.00e+00 1.00e+00h  1
   6 -3.2612971e-01 4.24e-02 9.16e-07  -2.5 8.41e+01    -  1.00e+00 1.00e+00h  1
   7 -3.3096494e-01 8.05e+01 4.63e-03  -3.8 3.74e+03    -  9.24e-01 1.00e+00h  1
   8 -3.3151074e-01 7.83e+00 1.23e-04  -3.8 1.13e+03    -  1.00e+00 1.00e+00h  1
   9 -3.3138534e-01 5.65e-01 1.09e-05  -3.8 2.55e+02    -  1.00e+00 1.00e+00h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10 -3.3139357e-01 1.08e-03 3.11e-08  -3.8 1.28e+01    -  1.00e+00 1.00e+00h  1
  11 -3.3167896e-01 1.27e-01 8.04e-04  -5.7 1.96e+02    -  1.00e+00 9.72e-01h  1
  12 -3.3168859e-01 1.17e-03 2.28e-08  -5.7 1.38e+01    -  1.00e+00 1.00e+00h  1
  13 -3.3169228e-01 3.87e-07 1.13e-09  -8.6 1.95e+00    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 13

                                   (scaled)                 (unscaled)
Objective...............:  -3.3169227859422723e-01   -3.3169227859422723e-01
Dual infeasibility......:   1.1339429495425323e-09    1.1339429495425323e-09
Constraint violation....:   1.4249529091406254e-09    3.8666985346935689e-07
Complementarity.........:   3.9898113808548269e-09    3.9898113808548269e-09
Overall NLP error.......:   3.9898113808548269e-09    3.8666985346935689e-07


Number of objective function evaluations             = 14
Number of objective gradient evaluations             = 14
Number of equality constraint evaluations            = 14
Number of inequality constraint evaluations          = 14
Number of equality constraint Jacobian evaluations   = 14
Number of inequality constraint Jacobian evaluations = 14
Number of Lagrangian Hessian evaluations             = 13
Total CPU secs in IPOPT (w/o function evaluations)   =      0.002
Total CPU secs in NLP function evaluations           =      0.000

EXIT: Optimal Solution Found.


# print results as a table
df_index = []
for var in inputs:
    df_index.append(var.name)
for var in outputs:
    df_index.append(var.name)
df_cols = trainers

df = pd.DataFrame(index=df_index, columns=df_cols)
for i in df_index:
    for j in df_cols:
        df[j][i] = results[(i, j)]

df  # display results table

	SurrType.PYSMO_KRG	SurrType.PYSMO_RBF	SurrType.PYSMO_PLY
fs.bypass_frac	0.1	0.1	0.1
fs.ng_steam_ratio	1.124624	1.124305	1.126451
fs.steam_flowrate	1.226539	1.226153	1.228592
fs.reformer_duty	39082.345925	39062.177089	39131.26643
fs.AR	0.004107	0.004107	0.004107
fs.C2H6	0.000518	0.000545	0.000519
fs.C3H8	0.000115	0.000119	0.000114
fs.C4H10	0.000065	0.000068	0.000065
fs.CH4	0.016193	0.016991	0.0162
fs.CO	0.104837	0.104856	0.104419
fs.CO2	0.053535	0.053528	0.053561
fs.H2	0.33169	0.331688	0.331692
fs.H2O	0.148951	0.148918	0.149056
fs.N2	0.34	0.34	0.34
fs.O2	0.0	0.0	-0.0