Fractional Factorial Designs

What is a fractional factorial design?

Fractional factorial designs are a type of classic screening experiment that can be used when a full factorial design – all possible treatment combinations of the factors – is too time consuming or expensive. A fractional factorial design is a subset, or fraction, of a full factorial design, where the chosen subset of treatment combinations allows you to estimate at least the main effects and possibly some or all two-factor interactions.

When should I use a fractional factorial design?

Like any screening design, you would use a fractional factorial design in the early stages of experimentation when your goal is to screen factors and identify the most important ones. However, modern screening methods, such as algorithmic designs or definitive screening designs, offer more flexibility in terms of the number of runs required or which effects you can estimate.

How do I use fractional factorial designs?

Most screening designs, including fractional factorial designs, will have some level of confounding, or aliasing, because you don’t run every possible treatment combination like you would in a full factorial. In other words, some of the effects will be confounded (or aliased) with other effects. When two or more effects are confounded with one another, it’s impossible to estimate their individual impacts on the response separately during the data analysis stage.

Let’s focus on the two-level fractional factorial design, where every factor has just two levels. This design is a subset of a two-level full factorial, or 2k design, where k is the number of factors. The number of treatments is a power of two, and the designs are typically defined by the proportion of the full factorial design for the same number of factors (e.g., $\frac{1}{2}$fraction, $\frac{1}{4}$ fraction,  $\frac{1}{8}$ fraction, etc.).

Generators: Two-level fractional factorial designs are denoted 2k-r, where r is the number of generators used to choose the subset of the full factorial design. (Some sources denote this as p, rather than r.) Generators are higher-order interaction terms that specify how to choose the fraction of runs from the full factorial design. They are the rules that determine which effects are confounded with one another. You can use generators to construct fractional factorial designs manually, but most simply use statistical software for that!

Resolution: The resolution of a fractional factorial design indicates how well you can separate main effects and interactions. In other words, it measures the degree of confounding in the design. The higher the resolution of a design, the less confounding there is. In a resolution III design, the main effects are not confounded with any other main effects, but are confounded with at least some two-factor interactions. In a resolution IV design, the main effects are not confounded with other main effects or with any two-factor interactions, but two-factor interactions might be confounded with one another. In resolution V designs, the main effects and all two-factor interactions can be estimated, but two-factor interactions are confounded with three-factor interactions.

Let’s imagine a three-factor fractional factorial design in four runs, or a 23-1 design.

This is a resolution III design, so the main effects (X1, X2, and X3) are not confounded (aliased) with one another, but they are each confounded with a two-factor interaction.

Effects Aliases
X1 = X2*X3
X2 = X1*X3
X3 = X1*X2

In this scenario, X1 is confounded with the two-factor interaction, X2*X3. When you analyze the data from the experiment, the statistical model will include estimates for X1, X2, and X3 only; the interaction effects are not estimable. In your model, the estimate for X1 is actually the combination of the effects of both X1 and the X2*X3 interaction, which means that if the estimate is statistically significant, you don’t know whether it’s the main effect or the interaction that’s important. You would need to perform additional experimental runs to isolate their individual effects.

Fractional factorial designs: An example

Let’s consider a scenario where you want to study how factors in a semiconductor manufacturing process might influence the Thickness of the resulting thin film grown on wafers. The four factors (and their low and high levels) are: Gas Flow (250, 275), Temp (825, 850), LF (Low Frequency) Power (180, 260), and HF (High Frequency) Power (750, 850).

A full factorial experiment using every possible treatment combination would require 16 runs, but you’d like to determine which factors influence Thickness in a smaller experiment. A fractional factorial design with eight runs (a ½ fraction) would be a resolution IV design, meaning that the main effects won’t be confounded with any interactions. You’ll also be able to estimate some two-factor interactions, although they will be confounded with other two-factor interactions. The eight treatment combinations for this 24-1 experiment are illustrated below.

The confounding of the two-factor interactions is as follows:

Effects Aliases
Gas Flow*Temp = LF Power*HF Power
Gas Flow*LF Power = Temp*HF Power
Gas Flow*HF Power = Temp*LF Power

When you fit a model to your data, only one interaction from each set of confounded interactions can be estimated. For example, the model can include Gas Flow*Temp or LF Power*HF Power, but not both. And remember that whichever term you include in the model, the estimate is the sum of both effects; they cannot be separated. We’ll include the interactions from the left column when we fit the model, but this choice is arbitrary (in this example it’s determined by the order in which the main effects were entered in the design).

You run the experiment and record the results for the response, Thickness. The Pattern column in the table shown indicates the low (−) and high (+) values of the factors in each treatment combination. Notice that the treatments are ordered randomly.

Analyzing a fractional factorial experiment

Analyzing experimental results is the process of building a statistical model that describes the impact the factors have on the response. With a total of eight runs, you can fit a model that includes the intercept and seven terms: four main effects, and three of the six two-factor interactions. This is called a saturated model, meaning it has the same number of estimates (parameters) as data points.

The main effects are not confounded with the two-factor interactions, but they are each confounded with three-factor interactions. Fortunately, the hierarchy principle of screening designs tells us that higher-order model terms, like three-factor interactions, are typically less likely to be important than lower-order model terms, like main effects. Based on this principle and your subject matter knowledge, you might be willing to make the assumption that the main effects are the primary contributors to the estimates, while the contributions from the three-factor interactions are negligible.

As noted above, the two-factor interactions are each confounded with another two-factor interaction. If the Gas Flow*Temp interaction is an important effect in the model, for example, it’s possible that the important effect is really LF Power*HF Power, or perhaps both interaction effects contribute to the response.

The fitted model below gives estimates for each term, but in a saturated model there are no degrees of freedom to calculate the standard error, so you cannot get a t-ratio or p-value (denoted Prob>|t| in the image below). In other words, you cannot test the statistical significance of the estimates.

Note: These are standardized estimates

Fortunately, there is a way to evaluate the importance of effects when you’re analyzing a saturated model, and it’s based on the sparsity principle of screening designs. The sparsity principle is based on the assumption that some of the effects don’t have a significant impact on the response and can be treated as random noise. These non-significant effects are used to produce an estimate of experimental error known as Lenth’s Pseudo Standard Error (Lenth's PSE).

The absolute values of the standardized effect estimates are plotted in a graph called a half-normal plot (or a half-normal probability plot), along with a line whose slope is equal to Lenth’s PSE. Effects that fall close to the line can be considered unimportant (i.e., they are random noise), while important effects will fall farther from the line.

There are two main effects identified in the plot as important: LF Power and HF Power. The third important effect identified is the interaction Gas Flow*Temp. This is where subject matter expertise and revisiting the confounding of effects in the design can help. Recall the aliases for the two-factor interactions:

Effects Aliases
Gas Flow*Temp = LF Power*HF Power
Gas Flow*LF Power = Temp*HF Power
Gas Flow*HF Power = Temp*LF Power

Gas Flow*Temp is confounded with LF Power*HF Power, meaning the estimate for Gas Flow*Temp is really reflecting the impact of both interaction terms. They might contribute equally to the response, or one might be negligible while the other is the truly important one.

The heredity principle of screening designs tells us that the presence of a higher-order term is typically associated with the presence of lower-order effects of the same factors. In this example, the important main effects are LF Power and HF Power, not Gas Flow or Temp, which suggests that it might be their interaction that’s important. If a subject matter expert knows that LF Power and HF Power often interact in this type of process, but they don’t expect there to be an interaction between Gas Flow and Temp at the levels used in this experiment, that would provide additional support for such a conclusion.

Conclusion

The purpose of this initial experiment was to screen four candidate factors in a small number of runs and fit a model to find the most important ones. You can use these results to guide the next phases of experimentation. For example, you might reduce the model by removing the main effects and interactions that appear to be unimportant (based on the half-normal plot and subject matter expertise), run an experiment to confirm those results, and then use a response surface experiment to find the best settings of the factors to optimize the response.

If the results are not as clear as in this example, you could augment the data you collected with additional experimental runs. One option would be to run the treatments from the full factorial design that weren’t included in your original fractional factorial design. Or you might choose runs that are targeted to allow you to estimate specific interactions of interest, rather than completing all the runs of the full factorial design.

This approach of first running a fractional factorial experiment – or any other type of screening design – and then following up with further experimentation is often a more efficient approach and a better use of resources than running the full factorial from the outset.