Experimental design methods

How to negotiate experiment size when choosing a design

by Christine Anderson-Cook

In classes where we learn about design of experiments, the problem is often posed as “Select a design with N runs ….” So it is tempting to think about this as a given constraint in our selection process. However, it can be to our great advantage to think about the size of design as a flexible criterion on which we want to evaluate our choices and make comparisons. In recent years, I have been an active advocate for looking at multiple criteria when examining the appropriateness and desirability of designs for the goals of our experiment.^1,2,3

It can be to our great advantage to think about the size of design as a flexible criterion on which we want to evaluate our choices and make comparisons.

Classical designs like factorial and fractional factorial designs for first-order models, and central composite or Box-Behnken designs for second-order models, have long had an important role in design of experiments if we are interested in spherical or cuboidal design spaces and if the size of our design matches the standard sizes. More recently, computer-generated D- or I-optimal designs have become available to allow greater flexibility when the shape of the design space and/or design size are nonstandard. Both of these categories of designs have strengths and weaknesses, so using the Evaluate Design platform in JMP allows careful exploration of important questions such as:

What kind of power can I expect from the design?
Will I be able to estimate all the terms in my model independently?
If the model is misspecified, how will parameter estimates be affected?
How well will I be able to predict the response in different regions of the design space?

Understanding these important questions before the experiment is run can save us from running an ineffective experiment that cannot do what we want it to, or allow us to choose between alternatives to match the priorities of our experiment.

For many years now, when I have been asked to design an experiment as part of a project, I have created multiple designs that not only included the target design size, but also a couple of choices that are smaller and larger. Since most studies involve questions that cannot be answered with a single experiment, it is valuable to view the management of resources in the context of sequential experimentation. If we can save some resources early in the process, that frees them up for potential later experiments. If we need more resources early to really understand what is going on with our process or product, this can position us to make key decisions and understand the underlying relationship between inputs and outputs. We can then use this knowledge to guide later experiments. Hence, the exploration of multiple design sizes to understand our choices.

In JMP, the Compare Designs platform makes these between-design size evaluations easier. Not only can it make comparisons between different design choices of the same size, but it can also provide useful information about different-sized designs. Imagine that you are asked to create an 18-run design to estimate the main effects for seven factors. Constructing the right designs to compare is also easy with the Custom Design and Augment Design platforms. You could use the Custom Design platform in JMP to construct designs with 16, 18 and 20 runs, and perhaps also include a definitive screening design (with a default of 17 runs). Armed with these four designs, we can now consider the impact of design size on their performance. Note that JMP allows comparison of up to three designs in a single window, but it is easy to have two Compare Design windows open with the same base design to allow for easy comparisons of up to five designs. Table 1 describes some questions of interest with the JMP tools that make it easy to assess these aspects of the designs.

Figures 1 through 3 show a sample of the tools available in JMP for the three (16-, 18- and 20-run) custom designs with the 18-run chosen as the base design. In Figure 1, I have adjusted the different factors to have different sized anticipated coefficients to see the ranges of powers. (Note: These designs are all symmetric with the same power for all effects of the same size.)

Figure 2 provides one summary of the correlations. You’ll see there are interesting comparisons to consider: The 18-run design has some slightly correlated main effects in the primary model, while the 16-run and 20-run designs have no correlation between any terms in the primary or possible models. It is not uncommon to have some sample sizes that are less suited to good balance and symmetry. The 16-run design has the smallest correlations between all sets of terms considered in Figure 2, but it has slightly less power than the 18- and 20-run designs.

Figure 3 shows a summary of the overall prediction variance profile across the seven-factor design space.⁴ In this case, there seems to be a similar reduction in the prediction variance with each increase in sample size. How we value the differences between the designs is dependent on our priorities and our tolerance for different costs.

Once we have examined these characteristics of competing designs, we can make the right choice for our particular study. In addition, it will be much easier to defend our choice of design, because we understand what performance is realistic to expect before any resources are spent. If we need to make the case that a larger experiment is needed, this discussion will be concrete and driven by available information. If we are able to suggest a smaller experiment than originally requested, then we may buy leverage for the next experimental problem when perhaps more resources are needed early in the process. Whenever we can translate a fix constraint into a tactical choice made for a justifiable reason, we are going to feel much more comfortable about our choice.

Many times, the initial resources allocated to the experiment are not sufficient to realistically meet our goals. By exploring alternative design sizes, we can put ourselves into a better negotiation position for getting additional experimental resources. How would you rather state your case: “A bigger experiment would be better,” or “The current experiment is not adequate for the power and prediction variance that we need to be able to answer the questions that the study is tasked to tackle. Here is an experiment that satisfies our needs”? We might not always win that negotiation and be able to increase the size of the design to our desired target, but we are making a compelling, data-driven argument that highlights the consequences and tradeoffs of several alternatives.

References

Lu, L., Anderson-Cook, C.M., Robinson, T.J. (2011) “Optimization of Designed Experiments Based on Multiple Criteria Utilizing a Pareto Frontier,” Technometrics 53 353-365.
Lu L., Anderson-Cook, C.M. (2012) “Rethinking the Optimal Response Surface Design for a First- Order Model with Two-Factor Interactions, when Protecting against Curvature,” Quality Engineering 24 404-422.
Lu., L., Anderson-Cook, C.M., Lin, D. (2014) “Optimal Designed Experiments Using a Pareto Front Search for Focused Preference of Multiple Objectives,” Computational Statistics and Data Analysis 71 1178-1192.
Zahran, A., Anderson-Cook, C.M. and Myers, R.H. (2003) “Fraction of Design Space to Assess the Prediction Capability of Response Surface Designs,” Journal of Quality Technology 35 377-386.

JMP Foreword

This article appeared in JMP Foreword magazine.

See the latest issue.