Key Concepts for Design of Experiments
What concepts and terms are fundamental to design of experiments (DOE)?
If you’re considering running a designed experiment, it is important to be familiar with the following terms: response, factor, factor level, treatment combination, main effect, interaction, model, experimental run, experimental unit, replication, randomization, confounding, and blocking.
What designs are common and useful to know as a beginner?
Full factorial, fractional factorial, and response surface designs are often the first designs an experimenter will encounter. Understanding the concepts in these designs can help build a foundation of the fundamentals, but you can also skip ahead to the powerful modern optimized designs that will save you time, money, and resources. The classic designs are often just subcases of the modern optimized designs.
Design of experiments terminology
A factor (or experimental factor) is a variable that is studied in a designed experiment, while the response is the variable that measures the outcome of interest. In a designed experiment, you might have many factors, and you can have more than one response. Some variables might be controlled, or held constant, during the experiment. There can also be uncontrolled noise or nuisance variables. These variables are sometimes called lurking variables.
When two factors interact, the effect of one variable on the response depends on the value of the other variable. The effects of these two factors on the response are not additive. So, studying one factor while ignoring the other can lead to incorrect conclusions about the response.
A factor level is a value of the factor that is used in the experiment, and a treatment is a unique combination of factor levels.
For example, suppose you are studying a cleaning process for titanium parts, and have two factors you’d like to study: Bath Time and Solution Type. Bath Time is a continuous variable, with two factor levels of interest: 10 minutes and 30 minutes. Solution Type is a categorical variable, with three levels: Types 1, 2, and 3. There are six possible treatments. The response is Residual Surface Contaminants.
In a full factorial experiment, or a factorial DOE, you run all possible combinations of factor levels (all possible treatments). For each trial in the experiment, or experimental run, you apply a treatment and record the response. When you randomize the experiment, you run the treatments in random order. Randomization averages out the effects of uncontrolled, or lurking, variables.
To illustrate this point, let’s suppose that you conduct the cleaning experiment without randomization. It takes one day to conduct all six trials. You run all of the treatments with Bath Time at 10 minutes in the morning, and all of the treatments with Bath Time at 30 minutes in the afternoon. Meanwhile, both the ambient temperature and humidity increase throughout the day.
In the subsequent analysis, you might conclude that Bath Time is significant. However, because you didn't randomize the treatments, you can't separate the effect of Bath Time from the effects of ambient temperature and humidity. Therefore, these effects are considered confused, or confounded. Randomizing the treatments, however, can prevent this confusion.
In the cleaning experiment, each treatment is run once. You can repeat, or replicate**,** a treatment, or you can replicate all of the treatments. In a fully replicated experiment, all the treatments are replicated.
Replication enables you to estimate the experimental error, which is the unexplained variation in your experiment (that is, the variation in your response that is not explained by changing your factors).
Finally, in a response surface experiment, you use more than two levels of your continuous factors. By doing so, you can model the curvature in the relationship between the factors and the response.
Below is a review of key terms used in our example:
| Term | Definition | Example |
| Response | The variable that measures the outcome of interest. | Residual Surface Contaminants is the response. |
| Factor | An independent or predictor variable that is a possible source of variation in the response variable. | Bath Time and Solution Type are the factors. |
| Factor level | A particular value of a factor. In other words, it is the specific types or amounts of a factor used in the experiment. | Bath Time is continuous with two factor levels: 10 minutes and 30 minutes. Solution Type is a categorical, with three levels: types 1, 2, and 3. |
| Treatment or design point | A combination of factor levels used in the experiment. In single factor studies, a treatment is the same as a factor level. | There are six possible treatments in the example: each combination of the two Bath Time levels and the three Solution Type levels. |
| Interaction | A situation where the effect of one variable on the response changes depending on the value of the other variable. | Bath Time and Solution Type interact if the effect of Bath Time on the response is different depending on the value of Solution Type. For example, if increasing Bath Time increases Residual Surface Contaminants when Solution Type 1 is used but decreases it when Solution Type 2, is used, then there is an interaction between these two effects. |
| Effect | The change in the mean response due to a change in the factor level. | The effect of Bath Time on the response is the increase (or decrease) in Residual Surface Contaminants when increasing bath time by x units. The effect of solution type on the response is the increase (or decrease) in Residual Surface Contaminants when changing Solution Type from 1 to 2. |
| Experimental unit | What receives the treatment. | The treatment combinations are randomized to a set of available titanium parts. The titanium part is the experimental unit. |
| Run | A single observation for a treatment. In other words, a run is the combination of factor levels and the value of the response variable. | The experimental process is run for one experimental unit: applying the randomized treatment combination and measuring the response. This is one run of the experiment. |
| Replication | Occurs when you assign the same treatment again to a new experimental unit. | Any treatment combinations that will be randomized and run more than once are considered replicated. |
| Uncontrolled variable(s) | A source of variation in the response that is not controlled in the experiment. These variables are sometimes called nuisance variables or lurking variables. | There may be environmental factors (like bath temp, ambient temp, or ambient humidity) or sourcing factors (production differences in materials from different vendors) that cause variation in the response but are not controlled in the experiment. |
| Confounded variables | A confusion between the true source of an effect among variables. If you fail to include an important source of variation in the experiment, you may confound the effect of a controlled experimental variable with the lurking effect of an uncontrolled nuisance variable. | If ambient temperature and humidity are important – but uncontrolled – sources of variation, and their changes coincide with changes in the experimental factor Bath Time, the effect of Bath Time will be confounded with the effect of ambient temperature and humidity. |