Algorithmic Screening Designs

What is an algorithmic screening design?

Algorithmic screening designs are constructed by design of experiments (DOE) software to accommodate your unique experimental context. An algorithmic design can incorporate many different factor types, constraints on the design space, restrictions on randomization, and varying numbers of runs, all while producing a design that is statistically optimal for factor screening.

When should I use an algorithmic screening design?

Algorithmic screening designs are applicable in almost any screening context. They are especially useful when your situation presents certain needs or limitations that cannot be met by a classical screening design.

Why use algorithmic screening designs?

Screening designs provide an efficient way to identify which of many factors most strongly influence a response. Algorithmic screening designs are useful when classical screening designs, such as fractional factorial designs and Plackett-Burman designs, are impractical or impossible to conduct in your experimental context.

Classical screening designs have been used effectively for years. However, these designs were created many decades ago as broad solutions to be applied in routine DOE scenarios. In practice, experimenters can encounter contexts or challenges that classical designs cannot easily accommodate. For example, a classical screening design may require a run in an area of the factor space that is infeasible or impossible to explore, or it may require a number of runs that exceed your run budget. Classical screening designs can force you to adapt your situation to fit the design, which might require excluding factors or levels that you would otherwise test, manually modifying the design to the detriment of its statistical benefits, or other drawbacks.

Algorithmic screening designs instead use a computer algorithm to build a customized design that fits your unique experimental context. Algorithmic designs allow for many customizations, including but not limited to:

How are algorithmic screening designs constructed?

An algorithmic screening design is constructed by DOE software based on your requirements. After specifying the factors and their types, the desired number of runs, constraints on the design space, and other parameters, the software uses an algorithm to identify a statistically optimal screening design that meets your specifications. You also have the option of specifying whether the design will be used to screen only for main effects or for any specific higher-order effects of interest.

“Statistically optimal” in this context is defined by a numeric value (an optimality criterion) that quantifies a statistical property of the design. In a typical screening design, the optimality criterion quantifies how well the design supports precise estimates of factor effects. It is done in screening experiments because precise effect estimates are needed to accurately determine which factors most strongly affect the response. Other optimality criteria are useful for other DOE contexts; for example, algorithmic response surface experiments typically use an optimality criterion that quantifies how well the design supports building a model that will make precise predictions of the response.

What is an example of an algorithmic screening design?

Let’s start with the same scenario presented in the screening design overview page, but with added complications.

Suppose you work at a pharmaceutical company that is developing a manufacturing process for a new drug. You need to identify which factors most strongly influence the drug’s impurity level, with the ultimate goal of applying response surface methodology to those factors to find factor settings that minimize impurity. The factors in this screening experiment are:

You face two complications in designing this experiment. First, resource constraints limit you to 15 runs, which can cover less than 2% of the 768 corners of the factor space. With this sparse design, you need to assess the main effect of each factor, as well as the possibility of quadratic curvature in the effects of one or more of the continuous factors, requiring at least one center point (i.e., a run at the middle value of all continuous factors). Second, you know that running at high Pressure while maintaining low Temperature and vice versa is not feasible, so you need to constrain the design to avoid runs in these areas of the factor space. This constraint is depicted graphically below, with the red regions representing areas of the factor space to avoid.

At first, you attempt to build a classical screening design, which immediately limits your options because most classical screening designs assume any categorical factors to be two-level, yet Vendor is three-level. The classical design available to you (an L18 design) requires a minimum of 18 runs and does not accommodate the Pressure by Temperature constraint, so you decide to reject the classical design in favor of an algorithmic design.

Using DOE software, like JMP, you create a 15-run design with no runs in the infeasible corners of the factor space. The software tells you that the design will require a minimum of 11 runs, which is necessary to estimate the intercept and the main effects of seven continuous factors, one three-level categorical factor, and one two-level categorical factor. (Note that a k-level factor requires estimating k-1 model parameters.) You still have four runs left in your budget, so you specify that the algorithm should produce a design with two center points and two replicate runs. Put together, they will help you assess whether any continuous effects exhibit quadratic curvature. (We’ll discuss this further when we analyze the data.) The design table is below, followed by a graph showing the design in the Pressure by Temperature space.

Run Blend Time Pressure pH Stir Rate Catalyst Temperature Feed Rate Vendor Particle Size
1 30 80 8 100 1 45 10 Cheap Small
2 30 80 5 120 2 45 15 Fast Small
3 10 65 5 100 2 45 15 Cheap Large
4 10 80 8 120 1 45 15 Good Large
5 30 80 5 120 2 45 15 Fast Small
6 10 60 8 120 2 15 15 Cheap Small
7 20 69.9 6.5 110 1.5 30.1 12.5 Good Large
8 30 60 8 100 1 15 15 Fast Large
9 10 60 5 120 1 40 10 Fast Small
10 20 69.9 6.5 110 1.5 30.1 12.5 Good Large
11 30 60 8 100 1 15 15 Fast Large
12 30 75 5 120 1 15 10 Cheap Large
13 10 80 8 100 2 20 10 Fast Large
14 10 80 5 100 1 20 15 Good Small
15 30 60 8 100 2 40 10 Good Small

You note that the design looks different from any classical screening design you’ve seen before. Both Pressure and Temperature are measured at five levels instead of two or three. You also notice that the center points are measured near to but not exactly at the middle values of the Pressure and Temperature ranges (i.e., 69.9 for Pressure and 30.1 for Temperature). This is a result of the design algorithm finding a statistically optimal design that obeys your Pressure by Temperature constraint while striving for precise parameter estimates in the analysis phase. You notice that the constraint has indeed been obeyed because there are no runs in the top left or bottom right corners of the Pressure by Temperature graph. Instead, the algorithm has placed points as close to those corners as your constraint allowed.

Next, you conduct the 15 runs in random order, measuring the resulting impurity level each time, and then analyzing the results using a multiple regression model with only main effects. You discover that the analysis reveals three factors with p-values less than 0.05, indicating statistical evidence of an effect on Impurity: Temperature , Vendor, and pH. You conclude that the other factors either are inactive or have negligibly small effects.

Factor p-value
Temperature 0.00204
Vendor 0.01744
pH 0.01750
Feed Rate 0.19999
Catalyst 0.24683
Blend Time 0.49980
Stir Rate 0.52453
Pressure 0.82430
Particle Size 0.92482

You next make a graph to understand the nature of the effects you’ve uncovered. The graph shows that both Temperature and pH effects are positive, with the Temperature effect being larger across its range. You also see a clear pattern in the effect of Vendor, with Cheap having a notably higher impurity level than Fast or Good. You also notice that the two center points (displayed as open circles) lie well below the lines plotting the effects of Temperature and pH. This is suggestive of quadratic curvature in at least one continuous effect, though with center points you are unable to identify which factor is responsible.

Finally, you consult a lack of fit test, which tests whether your model is missing an effect, such as quadratic curvature. This test requires at least one replicate in the design, and you smartly specified that the algorithm include replicates in its design. The p-value of the test is less than 0.05, so you conclude that the model is missing an effect. This is consistent with your visual assessment of curvature in the graph. You decide to move forward with the three active factors you’ve identified, augmenting your design with new runs to produce a response surface design that will allow you to estimate quadratic curvature for both continuous factors as well as all two-factor interactions among the three factors.

Lack of fit

Source DF Sum of Squares Mean Square F Ratio
Lack Of Fit 1 39.73 39.73 94.11
Pure Error 3 1.27 0.42 Prob > F
Total Error 4 41.00 0.0023*