Algorithmic Response Surface Design
What is algorithmic design for response surface methodology (RSM)?
An algorithmic design approach for RSM uses I-optimal designs, which are similar in their statistical properties to classical designs used in response surface methodology for response optimization, such as central composite designs and Box-Behnken designs. Algorithmic designs seek to maximize one of several optimality criteria including I-optimality. I-optimality addresses minimizing prediction variance and is used for second-order models (main effects plus quadratics and two-way interactions), where precise predictions take priority for optimizing your response.
When should I use algorithmic design for RSM?
Algorithmic designs for RSM can be used when optimization is the goal. They are especially useful when you need flexibility in your design beyond what classical designs provide. For example, using algorithmic designs for RSM would be effective when you have multiple factor types or restrictions on the design space.
Algorithmic design for RSM can include:
- Categorical factor types.
- Infeasible factor combinations.
- Blocking.
- Nonstandard models.
- Restricted randomization.
- A customized run budget.
An introduction to algorithmic design for RSM
An algorithmic design approach for RSM matches your design to your problem rather than fitting your problem into a predetermined design. DOE software, like JMP, uses an algorithm to create a RSM design that meets your design specifications by maximizing one of several optimality criteria. The optimality criteria used for RSM is I-optimality for the focus on precise predictions.
Let’s look at two common situations where algorithmic designs can provide flexibility for RSM. There are many situations where a categorical factor is also of interest in an RSM design. For instance, suppose we want to understand how three suppliers impact our response of interest in addition to three continuous factors. Traditionally, we would run a separate RSM design for each supplier since a categorical factor could not be included in the classical RSM design. For three continuous factors, we could run a 15-run Box-Behnken design. A 15-run Box-Behnken design run for each of the three suppliers comes to 45 total runs, which quickly exceeds our run budget. With algorithmic designs, however, we could include supplier as a factor in addition to the three continuous factors and run one design to understand the impact on the response in 24 runs. Twenty-four runs are based on heuristics for creating a balanced design for the proposed second-order model terms with some extra runs to estimate an error term.
Another situation where algorithmic designs for RSM can provide flexibility is if we have practical factor level combinations that we know in advance to be infeasible for reasons such as safety. For example, consider if we were optimizing a new pizza baking process. There are conditions that we know would not produce an acceptable pizza. Baking a pizza at a high temperature and for a long time is going to give us a burnt pizza. Likewise, baking a pizza at a low temperature for a short time is going to give us an underbaked pizza. In this example, the design region is irregularly shaped and has constraints on the design region.
Algorithmic designs for RSM gives us the flexibility to accommodate these design challenges, as well as other cases such as blocking and a customized run budget. Algorithmic designs can produce classical designs used in response surface methodology when it is optimal for the design specifications.
An example of algorithmic design for RSM
Let’s explore in more detail the design presented in the RSM design overview. The objective of the experiment is to improve a process by finding the operating conditions that produce the highest Yield and lowest Impurity. The goal to maximize Yield is equally as important as minimizing the Impurity. To create an algorithmic design, first we choose the responses and the factors.
The responses and factors are:
- Yield: The response goal is maximize (higher is better).
- Impurity: The response goal is minimize (lower is better).
- pH: The factor range is 5 to 8.
- Temperature: The factor range is 15° to 45° Celsius.
- Vendor: There are three vendors (factor levels): Good, Fast, and Cheap.
The operational decisions of which factor settings of pH and Temperature to use along with which Vendor are made based off how these factors will predict Yield and Impurity. Subject matter knowledge is also important to define narrow factor ranges. It is critical that Yield and Impurity include as little error as possible to make accurate operational decisions. I-optimal as the optimality criteria is ideal because it minimizes the prediction variance.
Next, we need to propose an initial statistical model. The proposed statistical model is directly related to the optimization goal of this experiment. We want to propose a second-order model including quadratic and interaction terms, which offer flexibility for predicting and optimizing Yield and Impurity. The quadratic terms for the continuous factors allow us to determine if there is a peak or a valley in either Yield or Impurity between the high and low levels of pH and Temperature. Quadratic effects cannot be estimated for categorical factors. When quadratic effects are specified in the proposed initial statistical model, a third level, the midpoint, is added to the design for pH and Temperature. In the design, pH will be tested at 5, 6.5, and 8 while Temperature will be tested at 15°, 30°, and 45°.
The model terms included in the initial statistical model are noted in the figure below.
An I-optimal design is generated based on the model terms we include. If we made any other choices, such as constraints on the design region, these would influence the design generated. In this example, we did not have constraints or other choices that would influence the design. The design consists of the number of runs needed to estimate the proposed statistical model plus five or six extra unique runs for estimating model error. For our example, an RSM design with 18 runs is proposed as shown below.
This I-optimal design minimizes the average prediction variance as seen in the figures below. You’ll notice that the prediction variance remains low for a large portion of the design space, while increasing near the boundaries of the design space. The profiles below show the factor range on the X axis and the variance on the Y axis.
The values for Yield and Impurity are recorded in the data table after the experiment is executed following the design run order.
To analyze the experimental data, we will use multiple linear regression to fit the initial “full” specified statistical model for both Yield and Impurity. An individual model is fit for each response, Yield and Impurity. The model terms are the same as we specified during the design setup and include the following terms:
- Intercept.
- Three main effects (pH, Temperature, and Vendor).
- Three two-factor interactions.
- Two quadratic effects (for the continuous factors).
For both models, inactive effects are removed from the model using variable selection. Active effects are terms that are statistically significant and influence the responses. Looking at the effect summary for both models with the active effects, we see curvature is an active effect for Temperature and pH with the quadratic term. We notice there are also active interactions between pH and Vendor, as well as pH and Temperature.
Let’s look at the reduced model for Yield and Impurity by observing cross-sections of the surface, or profiles. For our example, the goal to maximize Yield is equally as important as minimizing the Impurity. To understand the tradeoffs and find a combination of factor settings that balance the tradeoffs between the two goals, we combine the profile traces and optimize them together.
In this example, we determined that setting the pH to 7.18, the Temperature to 33.71, and using the Vendor “Fast” is predicted to maximize Yield at 96.06 and minimize Impurity at 0.82%. It is worth noting, however, that there might be other factor combinations that would produce similar results.