Supercritical CO2 Property Surrogate with PySMO Surrogate Object - Training Surrogate (Part 1)

###############################################################################
# The Institute for the Design of Advanced Energy Systems Integrated Platform
# Framework (IDAES IP) was produced under the DOE Institute for the
# Design of Advanced Energy Systems (IDAES).
#
# Copyright (c) 2018-2023 by the software owners: The Regents of the
# University of California, through Lawrence Berkeley National Laboratory,
# National Technology & Engineering Solutions of Sandia, LLC, Carnegie Mellon
# University, West Virginia University Research Corporation, et al.
# All rights reserved.  Please see the files COPYRIGHT.md and LICENSE.md
# for full copyright and license information.
###############################################################################

Supercritical CO2 Property Surrogate with PySMO Surrogate Object - Training Surrogate (Part 1)#

Maintainer: Javal Vyas

Author: Javal Vyas

Updated: 2024-01-24

1. Introduction#

This notebook illustrates the use of the PySMO Polynomial surrogate trainer to produce an ML surrogate based on supercritical CO2 data from simulation using REFPROP package. PySMO also has other training methods like Radial Basis Function and Kriging surrogate models, but we focus on Polynomial surrogate model.

There are several reasons to build surrogate models for complex processes, even when higher fidelity models already exist (e.g., reduce model size, improve convergence reliability, replace models with externally compiled code and make them fully-equation oriented).

In this example, we intend to make a surrogate for the physical properties of S-CO2 to be embedded in the property package. This property package will be used to get the physical properties of S-CO2 in the flowsheet simulation. To learn more about property package, see the IDAES-PSE Github Page or IDAES Read-the-docs.

1.1 Need for ML Surrogates#

The properties predicted by the surrogate are enthalpy and entropy of the S-CO2 based on the pressure and temperature of the system. The analytical equation of getting the enthalpy and entropy from pressure and temperature are in the differential form and would make the problem a DAE system. To counter this problem and keep the problem algebraic, we will use the ML surrogates and relate enthalpy and entropy with the pressure and temperature as an algebraic equation.

1.2 Supercritical CO2 cycle process#

The following flowsheet will be used to optimize the design for the cooling of the fusion reactor using supercritical CO2 cycle. We shall focus on training the surrogate for this notebook and move to constructing the flowsheet and the properties package in the subsequent notebooks. The take away from this flowsheet is that, 3 variables can be measured in any given unit which are flow, pressure and temperature and other properties can be calculated using them. Thus, surrogate should have pressure and temperature as the inputs.

In this example, we will train the model using polynomial regression for our data and then demonstrate that we can solve an optimization problem with that surrogate model.

from IPython.display import Image
from pathlib import Path

def datafile_path(name):
    return Path("..") / name

Image(datafile_path("CO2_flowsheet.png"))

../../../../_images/d8050a37171e8e1c8ef9b92bd7f8a6b2f52abfff04d2faffafe235388ba5adee.png

2. Training and Validating Surrogate#

First, let’s import the required Python and IDAES modules:

# Import statements
import os
import numpy as np
import pandas as pd

# Import IDAES libraries
from idaes.core.surrogate.sampling.data_utils import split_training_validation
from idaes.core.surrogate.pysmo_surrogate import PysmoPolyTrainer, PysmoSurrogate
from idaes.core.surrogate.plotting.sm_plotter import (
    surrogate_scatter2D,
    surrogate_parity,
    surrogate_residual,
)

2.1 Importing Training and Validation Datasets#

In this section, we read the dataset from the CSV file located in this directory. 500 data points were simulated for S-CO2 physical properties using REFPROP package. This example is trained on the entire dataset because neural network can overfit on smaller dataset. The data is separated using an 80/20 split into training and validation data using the IDAES split_training_validation() method.

We rename the column headers because they contained “.”, which may cause errors while reading the column names in subsequent code, thus as a good practice we change them to the variable names to be used in the property package. Further, the input variables are pressure, temperature , while the output variables are enth_mol, entr_mol, hence we create two new dataframes for the input and output variables.

# Import training data
np.set_printoptions(precision=6, suppress=True)

csv_data = pd.read_csv(datafile_path("500_Points_DataSet.csv"))
csv_data.columns.values[0:6] =["pressure", "temperature","enth_mol","entr_mol","CO2_enthalpy","CO2_entropy"]
data = csv_data.sample(n=500)

input_data = data.iloc[:, :2]
output_data = data.iloc[:, 2:4]

# # Define labels, and split training and validation data
input_labels = list(input_data.columns)
output_labels =  list(output_data.columns) 

n_data = data[input_labels[0]].size
data_training, data_validation = split_training_validation(
    data, 0.8, seed=n_data
)

2.2 Training Surrogates with PySMO#

IDAES builds a model class for each type of PySMO surrogate model. In this case, we will call and build the Polynomial Regression class. Regression settings can be directly passed as class arguments, as shown below. In this example, allowed basis terms span a 5th order polynomial, a variable product as well as a extra features are defined, and data is internally cross-validated using 10 iterations of 80/20 splits to ensure a robust surrogate fit. Note that PySMO uses cross-validation of training data to adjust model coefficients and ensure a more accurate fit, while we separate the validation dataset pre-training in order to visualize the surrogate fits.

Finally, after training the model we save the results and model expressions to a folder which contains a serialized JSON file. Serializing the model in this fashion enables importing a previously trained set of surrogate models into external flowsheets. This feature will be used later.

# Create PySMO trainer object
trainer = PysmoPolyTrainer(
    input_labels=input_labels,
    output_labels=output_labels,
    training_dataframe=data_training,
)

var = output_labels
trainer.config.extra_features=['pressure*temperature*temperature','pressure*pressure*temperature*temperature','pressure*pressure*temperature','pressure/temperature','temperature/pressure']
# Set PySMO options
trainer.config.maximum_polynomial_order = 5
trainer.config.multinomials = True
trainer.config.training_split = 0.8
trainer.config.number_of_crossvalidations = 10

# Train surrogate (calls PySMO through IDAES Python wrapper)
poly_train = trainer.train_surrogate()

# create callable surrogate object
xmin, xmax = [7,306], [40,1000]
input_bounds = {input_labels[i]: (xmin[i], xmax[i]) for i in range(len(input_labels))}
poly_surr = PysmoSurrogate(poly_train, input_labels, output_labels, input_bounds)
# save model to JSON
model = poly_surr.save_to_file("pysmo_poly_surrogate.json", overwrite=True)

===========================Polynomial Regression===============================================

No iterations will be run.
Default parameter estimation method is used.
Parameter estimation method:  pyomo 

No iterations will be run.
WARNING: Could not locate the 'ipopt' executable, which is required for solver
ipopt

---------------------------------------------------------------------------
ApplicationError                          Traceback (most recent call last)
Cell In[5], line 17
trainer.config.number_of_crossvalidations = 10
# Train surrogate (calls PySMO through IDAES Python wrapper)
---> 17 poly_train = trainer.train_surrogate()
# create callable surrogate object
xmin, xmax = [7,306], [40,1000]

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/idaes/core/surrogate/pysmo_surrogate.py:196, in PysmoTrainer.train_surrogate(self)
if hasattr(self, "_input_bounds"):
   self._trained.input_bounds = self._input_bounds
--> 196 self._training_main_loop()
return self._trained

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/idaes/core/surrogate/pysmo_surrogate.py:223, in PysmoTrainer._training_main_loop(self)
# Create and train model
model = self._create_model(pysmo_input, output_label)
--> 223 model.training()
# Store results
result = PysmoSurrogateTrainingResult()

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/idaes/core/surrogate/pysmo/polynomial_regression.py:1639, in PolynomialRegression.training(self)
npe = NumpyEvaluator(cMap)
additional_data = list(
   npe.walk_expression(term) for term in self.additional_term_expressions
)
-> 1639 return self.polynomial_regression_fitting(additional_data)

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/idaes/core/surrogate/pysmo/polynomial_regression.py:1473, in PolynomialRegression.polynomial_regression_fitting(self, additional_regression_features)
for poly_order in range(1, self.max_polynomial_order + 1):
   for cv_number in range(1, self.number_of_crossvalidations + 1):
-> 1473         phi, train_error, cv_error = self.polyregression(
           poly_order,
           training_data["training_set_" + str(cv_number)],
           cross_val_data["test_set_" + str(cv_number)],
           training_data["training_extras_" + str(cv_number)],
           cross_val_data["test_extras_" + str(cv_number)],
       )
       if cv_error < best_error:
           best_error = cv_error

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/idaes/core/surrogate/pysmo/polynomial_regression.py:897, in PolynomialRegression.polyregression(self, poly_order, training_data, test_data, additional_x_training_data, additional_x_test_data)
   phi_vector = self.bfgs_parameter_optimization(
       x_polynomial_data, y_training_data
   )
elif self.solution_method == "pyomo":
--> 897     phi_vector = self.pyomo_optimization(x_polynomial_data, y_training_data)
phi_vector = phi_vector.reshape(
   phi_vector.shape[0], 1
)  # Pseudo-inverse approach
x_polynomial_data_test = self.polygeneration(
   poly_order, self.multinomials, x_test_data, additional_x_test_data
)

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/idaes/core/surrogate/pysmo/polynomial_regression.py:813, in PolynomialRegression.pyomo_optimization(x, y)
opt.options["max_iter"] = 1000
# TODO: Should this be checking the for a feasible solution?
--> 813 opt.solve(instance)
# Convert theta variable into numpy array
phi = np.zeros((len(instance.theta), 1))

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/pyomo/opt/base/solvers.py:534, in OptSolver.solve(self, *args, **kwds)
def solve(self, *args, **kwds):
   """Solve the problem"""
--> 534     self.available(exception_flag=True)
   #
   # If the inputs are models, then validate that they have been
   # constructed! Collect suffix names to try and import from solution.
   #
   from pyomo.core.base.block import BlockData

File ~/checkouts/readthedocs.org/user_builds/idaes-examples/envs/latest/lib/python3.8/site-packages/pyomo/opt/solver/shellcmd.py:140, in SystemCallSolver.available(self, exception_flag)
   if exception_flag:
       msg = "No executable found for solver '%s'"
--> 140         raise ApplicationError(msg % self.name)
   return False
return True

ApplicationError: No executable found for solver 'ipopt'

2.3 Visualizing surrogates#

Now that the surrogate models have been trained, the models can be visualized through scatter, parity and residual plots to confirm their validity in the chosen domain. The training data will be visualized first to confirm the surrogates are fit the data, and then the validation data will be visualized to confirm the surrogates accurately predict new output values.

# visualize with IDAES surrogate plotting tools
surrogate_scatter2D(poly_surr, data_training, filename="pysmo_poly_train_scatter2D.pdf")
surrogate_parity(poly_surr, data_training, filename="pysmo_poly_train_parity.pdf")
surrogate_residual(poly_surr, data_training, filename="pysmo_poly_train_residual.pdf")

../../../../_images/2748dfb0d05b2ca84047f6767ca66c68e7ae102ffa36c84fd91b96ee579c31d2.png

../../../../_images/a14712bb961b61624b5ba1709377835e80061f2b6f107f91f597a7b59ffbe982.png

../../../../_images/7506f70cccc9a78022fe4468f73b6fb11d5773b1671e347509fe5ac8b06c7c64.png

../../../../_images/bdad04499ed27f3c820a7a71f0588e4a4c2b89e67730ccde538e7943baabdb53.png

../../../../_images/e72ad5fcf4e927af810eff0405de60358d18a991a1695ddfdcd2ca865bdaea53.png

../../../../_images/a88eab9241c188e22cdeba25dfc389112db8eda746d1e45d777399e4dc401acd.png

../../../../_images/40df59ce17159665e48ca337b8edc3a356237fc721ef40585b02135b7e7fe5d6.png

../../../../_images/6d37abee975f9653b98d9de71fb78db5e0787ec9d8b0879c093b04bc109cf7c1.png

../../../../_images/e7ecc6027ed6b0db74254a7d9fcd4847b893bb457d99d5942a7c34410e2090c6.png

../../../../_images/617b20af8b2af838a2364012f568e45fcf025ed5a9a6c7e940971e9339e79284.png

2.4 Model Validation#

We check the fit on the validation set to see if the surrogate is fitting well. This step can be used to check for overfitting on the training set.

# visualize with IDAES surrogate plotting tools
surrogate_scatter2D(poly_surr, data_validation, filename="pysmo_poly_val_scatter2D.pdf")
surrogate_parity(poly_surr, data_validation, filename="pysmo_poly_val_parity.pdf")
surrogate_residual(poly_surr, data_validation, filename="pysmo_poly_val_residual.pdf")