Definitive Screening Designs

What are definitive screening designs?

A definitive screening design (DSD) is a specialized experimental design for identifying which of many continuous factors most strongly influence a response. DSDs require a small number of runs while offering advantages over standard screening designs of a similar run size. They reduce ambiguity in identifying active effects, allow you to identify curvature in the effects of individual factors, and support estimation of full quadratic models in a small subset of factors.

Why should you use a definitive screening design?

Definitive screening designs offer numerous advantages over standard screening designs, such as fractional factorial or Plackett-Burman designs. These standard designs can alias some main effects and two-factor interactions as well as completely confound two-factor interactions with one another, leading to ambiguity in identifying active effects. Beyond that, while standard screening designs with center points can identify the presence of curvature in one or more factors, they cannot determine which factors are the source of the curvature without requiring additional runs. By comparison, DSDs offer several advantages:

DSDs offer these benefits in a small number of runs. For six or more factors, these designs require only slightly more runs than twice the number of factors. For example, with 14 continuous factors, a minimum-sized DSD requires only 29 runs, a small fraction of the corresponding full factorial design (214 = 16,384 runs). In comparison, a resolution IV fractional factorial design requires at least 32 runs. Like the DSD, it avoids aliasing main effects and two-factor interactions, but unlike the DSD, it fully confounds some two-factor interactions with one another and is unable to assess quadratic curvature in individual factors, even if center points are added. The DSD collects a relatively large amount of information in an efficient design.

Beyond screening, DSDs can directly enable response optimization via response surface methodology when the number of active factors is small. A DSD for six or more factors allows estimation of the full quadratic model in any three of the factors, while DSDs of 18 or more factors can fit the full quadratic model in any four factors, and DSDs for 24 or more factors can fit the full quadratic model in any five factors. This means that if you are fortunate enough to have few active factors, you can use the same design for both screening and response optimization. Otherwise, you can augment your DSD with additional runs as needed.

How do you create a definitive screening design?

The benefits of DSDs are due to their special structure. To illustrate, consider this design table for a DSD with six continuous factors. Each factor is coded to have a high value of 1, a low value of -1, and a middle value of 0.

Run X1 X2 X3 X4 X5 X6
1 0 1 1 1 1 1
2 0 -1 -1 -1 -1 -1
3 1 0 -1 1 1 -1
4 -1 0 1 -1 -1 1
5 1 -1 0 -1 1 1
6 -1 1 0 1 -1 -1
7 1 1 -1 0 -1 1
8 -1 -1 1 0 1 -1
9 1 1 1 -1 0 -1
10 -1 -1 -1 1 0 1
11 1 -1 1 1 -1 0
12 -1 1 -1 -1 1 0
13 0 0 0 0 0 0

The DSD is a foldover design, with each run being paired with another run in which all the factors’ values have their sign reversed. For example, Runs 1 and 2 form a foldover pair, with Run 2 simply reversing the signs of the values in Run 1. The same applies to Runs 3 and 4, 5, and 6, and so on, with Run 13 being a center point measured at the middle value of all factors. The foldover aspect of the design eliminates aliasing of main effects and two-factor interactions.

Note that within any foldover pair, one factor is measured at its middle value in both runs, while all other factors are measured at their low or high values. This places points along edges of the factor space, not just in the corners or center, as in standard screening designs. This aspect of the design makes all quadratic effects estimable.

Plotting the design in the first three factors further illustrates the structure of a DSD. Notice that every point (except the center point) has a corresponding foldover point that is in a “mirrored” location across the cube. The design also includes midpoints on the edges of the factor space, with each factor being measured at its midpoint a total of three times when including the center point.

This example design represents the minimum number of runs for six continuous factors. In practice, it is recommended to include at least four additional runs, which is achieved by introducing fictitious inactive factors to the design. Doing so greatly increases the DSD’s ability to detect active two-factor interactions and quadratic curvature.

So how can you create your own DSD? Don’t worry about doing it yourself; statistical software, like JMP, can handle the entire design process for you.

An example of a definitive screening design

Let’s say you are an engineer at a biotechnology firm and are tasked with developing a new extraction process with the goal of maximizing the yield of the extraction, measured in milligrams. You first need to identify which process factors most strongly affect yield, and you begin by testing multiple solvents, pH, and time in solution. Your factors and ranges are:

You elect to use a DSD because you have all continuous factors, suspect two-factor interactions and quadratic curvature may be present, and after screening would like to fit a full quadratic model in the active factors without the need for many (or any) additional runs. You elect to include four runs beyond the minimum of 13 to better detect second-order effects. You use the following 17-run DSD and record the results in the Yield column.

Run Methanol Ethanol Propanol Butanol pH Time Yield
1 0 10 5 0 6 2 23.43
2 0 0 10 10 7.5 1 4.85
3 5 10 10 10 9 2 40.91
4 10 10 0 10 6 1 21.68
5 0 0 10 0 9 2 3.09
6 10 0 10 0 6 1.5 26.09
7 5 5 5 5 7.5 1.5 30.05
8 0 0 0 10 6 2 11.99
9 0 10 0 10 9 1.5 11.54
10 10 5 10 10 6 2 33.46
11 10 10 0 0 7.5 2 47.44
12 10 0 0 5 9 2 23.58
13 5 0 0 0 6 1 22.26
14 10 0 5 10 9 1 27.07
15 0 10 10 5 6 1 3.35
16 0 5 0 0 9 1 3.18
17 10 10 10 0 9 1 21.67

Visualizing the main effect of each factor shows that Methanol and Time exert strong positive effects on Yield, with Ethanol exerting a lesser positive effect as well. The lines for Propanol, Butanol, and pH appear flat, suggesting negligible main effects for these factors. A main effects-only multiple regression model confirms that the main effects of Methanol, Ethanol, and Time are active, and because you used a DSD, you know that the main effect estimates are not biased by any active two-factor interactions that you have not yet investigated. You choose to move forward with Methanol, Ethanol, and Time as your active factors.

Because you used a six-factor DSD and identified only three active factors, you are able to fit a full quadratic model without adding more runs to the design. You utilize multiple regression with a variable selection method to arrive at a final model to use for optimization. That model reveals that the effect of Methanol exhibits quadratic curvature and that Ethanol and Time exhibit a two-factor interaction: Ethanol exerts a negligible effect when Time is low but a strong positive effect when Time is high.

Using the final model, you identify the following optimal factor settings, which are predicted to produce a mean Yield of 45.34 mg.