One-Way ANOVA

What is one-way ANOVA?

One-way analysis of variance (ANOVA) is a statistical method for testing for differences in the means of three or more groups.

How is one-way ANOVA used?

One-way ANOVA is typically used when you have a single independent variable, or factor, and your goal is to investigate if variations, or different levels of that factor have a measurable effect on a dependent variable.

What are some limitations to consider?

One-way ANOVA can only be used when investigating a single factor and a single dependent variable. When comparing the means of three or more groups, it can tell us if at least one pair of means is significantly different, but it can’t tell us which pair. Also, it requires that the dependent variable be normally distributed in each of the groups and that the variability within groups is similar across groups.

One-way ANOVA is a test for differences in group means

See how to perform a one-way ANOVA using statistical software

One-way ANOVA is a statistical method to test the null hypothesis (H0) that three or more population means are equal vs. the alternative hypothesis (Ha) that at least one mean is different. Using the formal notation of statistical hypotheses, for k means we write:

$ H_0:\mu_1=\mu_2=\cdots=\mu_k $

$ H_a:\mathrm{not\mathrm{\ }all\ means\ are\ equal} $

where $\mu_i$ is the mean of the i-th level of the factor.

Okay, you might be thinking, but in what situations would I need to determine if the means of multiple populations are the same or different? A common scenario is you suspect that a particular independent process variable is a driver of an important result of that process. For example, you may have suspicions about how different production lots, operators or raw material batches are affecting the output (aka a quality measurement) of a production process.

To test your suspicion, you could run the process using three or more variations (aka levels) of this independent variable (aka factor), and then take a sample of observations from the results of each run. If you find differences when comparing the means from each group of observations using an ANOVA, then (assuming you’ve done everything correctly!) you have evidence that your suspicion was correct—the factor you investigated appears to play a role in the result!

A one-way ANOVA example

Let's work through a one-way ANOVA example in more detail. Imagine you work for a company that manufactures an adhesive gel that is sold in small jars. The viscosity of the gel is important: too thick and it becomes difficult to apply; too thin and its adhesiveness suffers. You've received some feedback from a few unhappy customers lately complaining that the viscosity of your adhesive is not as consistent as it used to be. You've been asked by your boss to investigate.

You decide that a good first step would be to examine the average viscosity of the five most recent production lots. If you find differences between lots, that would seem to confirm the issue is real. It might also help you begin to form hypotheses about factors that could cause inconsistencies between lots.

You measure viscosity using an instrument that rotates a spindle immersed in the jar of adhesive. This test yields a measurement called torque resistance. You test five jars selected randomly from each of the most recent five lots. You obtain the torque resistance measurement for each jar and plot the data.

Figure 1: Plot of torque measurements by lot

From the plot of the data, you observe that torque measurements from the Lot 3 jars tend to be lower than the torque measurements from the samples taken from the other lots. When you calculate the means from all your measurements, you see that the mean torque for Lot 3 is 26.77—much lower than the other four lots, each with a mean of around 30.

Table 1: Mean torque measurements from tests of five lots of adhesive

Lot #NMean
1529.65
2530.43
3526.77
4530.42
5529.37

The ANOVA table

ANOVA results are typically displayed in an ANOVA table. An ANOVA table includes:

  • Source: the sources of variation including the factor being examined (in our case, lot), error and total.
  • DF: degrees of freedom for each source of variation.
  • Sum of Squares: sum of squares (SS) for each source of variation along with the total from all sources.
  • Mean Square: sum of squares divided by its associated degrees of freedom.
  • F Ratio: the mean square of the factor (lot) divided by the mean square of the error.
  • Prob > F: the p-value.

Table 2: ANOVA table with results from our torque measurements

SourceDFSum of SquaresMean SquareF RatioProb > F
Lot445.2511.316.900.0012
Error2032.801.64  
Total2478.05   

We'll explain how the components of this table are derived below. One key element in this table to focus on for now is the p-value. The p-value is used to evaluate the validity of the null hypothesis that all the means are the same. In our example, the p-value (Prob > F) is 0.0012. This small p-value can be taken as evidence that the means are not all the same. Our samples provide evidence that there is a difference in the average torque resistance values between one or more of the five lots.

What is a p-value?

A p-value is a measure of probability used for hypothesis testing. The goal of hypothesis testing is to determine whether there is enough evidence to support a certain hypothesis about your data. Recall that with ANOVA, we formulate two hypotheses: the null hypothesis that all the means are equal and the alternative hypothesis that the means are not all equal.

Because we’re only examining random samples of data pulled from whole populations, there’s a risk that the means of our samples are not representative of the means of the full populations. The p-value gives us a way to quantify that risk. It is the probability that any variability in the means of your sample data is the result of pure chance; more specifically, it’s the probability of observing variances in the sample means at least as large as what you’ve measured when in fact the null hypothesis is true (the full population means are, in fact, equal).

A small p-value would lead you to reject the null hypothesis. A typical threshold for rejection of a null hypothesis is 0.05. That is, if you have a p-value less than 0.05, you would reject the null hypothesis in favor of the alternative hypothesis that at least one mean is different from the rest.

Based on these results, you decide to hold Lot 3 for further testing. In your report you might write: The torque from five jars of product were measured from each of the five most recent production lots. An ANOVA analysis found that the observations support a difference in mean torque between lots (p = 0.0012). A plot of the data shows that Lot 3 had a lower mean (26.77) torque as compared to the other four lots. We will hold Lot 3 for further evaluation.

Remember, an ANOVA test will not tell you which mean or means differs from the others, and (unlike our example) this isn't always obvious from a plot of the data. One way to answer questions about specific types of differences is to use a multiple comparison test. For example, to compare group means to the overall mean, you can use analysis of means (ANOM). To compare individual pairs of means, you can use the Tukey-Kramer multiple comparison test.

One-way ANOVA calculation

Now let’s consider our torque measurement example in more detail. Recall that we had five lots of material. From each lot we randomly selected five jars for testing. This is called a one-factor design. The one factor, lot, has five levels. Each level is replicated (tested) five times. The results of the testing are listed below.

Table 3: Torque measurements by Lot

 Lot 1Lot 2Lot 3Lot 4Lot 5
Jar 129.3930.6327.1631.0329.67
Jar 231.5132.1026.6330.9829.32
Jar 330.8830.1125.3128.9526.87
Jar 427.6329.6327.6631.4531.59
Jar 528.8529.6827.1029.7029.41
Mean29.6530.4326.7730.4229.37

To explore the calculations that resulted in the ANOVA table above (Table 2), let's first establish the following definitions:

$n_i$ = Number of observations for treatment $i$ (in our example, Lot $i$)

$N$ = Total number of observations

$Y_{ij}$ = The jth observation on the ith treatment

$\overline{Y}_i$ = The sample mean for the ith treatment

$\overline{\overline{Y}}$ = The mean of all observations (grand mean)

Sum of Squares

With these definitions in mind, let's tackle the Sum of Squares column from the ANOVA table. The sum of squares gives us a way to quantify variability in a data set by focusing on the difference between each data point and the mean of all data points in that data set. The formula below partitions the overall variability into two parts: the variability due to the model or the factor levels, and the variability due to random error.  

$$ \sum_{i=1}^{a}\sum_{j=1}^{n_i}(Y_{ij}-\overline{\overline{Y}})^2\;=\;\sum_{i=1}^{a}n_i(\overline{Y}_i-\overline{\overline{Y}})^2+\sum_{i=1}^{a}\sum_{j=1}^{n_i}(Y_{ij}-\overline{Y}_i)^2 $$

$$ SS(Total)\;     =     \;SS(Factor)\;     +     \;SS(Error) $$

While that equation may seem complicated, focusing on each element individually makes it much easier to grasp. Table 4 below lists each component of the formula and then builds them into the squared terms that make up the sum of squares. The first column of data ($Y_{ij}$) contains the torque measurements we gathered in Table 3 above.

Another way to look at sources of variability: between group variation and within group variation

Recall that in our ANOVA table above (Table 2), the Source column lists two sources of variation: factor (in our example, lot) and error. Another way to think of those two sources is between group variation (which corresponds to variation due to the factor or treatment) and within group variation (which corresponds to variation due to chance or error). So using that terminology, our sum of squares formula is essentially calculating the sum of variation due to differences between the groups (the treatment effect) and variation due to differences within each group (unexplained differences due to chance).  

Table 4: Sum of squares calculation

Lot$Y_{ij} $$\overline{Y}_i $$\overline{\overline{Y}}$
$\overline{Y}_i-\overline{\overline{Y}}$$Y_{ij}-\overline{\overline{Y}}$$Y_{ij}-\overline{Y}_i $$(\overline{Y}_i-\overline{\overline{Y}})^2 $$(Y_{ij}-\overline{Y}_i)^2 $$(Y_{ij}-\overline{\overline{Y}})^2 $
129.3929.6529.330.320.06-0.260.100.070.00
131.5129.6529.330.322.181.860.103.464.75
130.8829.6529.330.321.551.230.101.512.40
127.6329.6529.330.32-1.70-2.020.104.082.89
128.8529.6529.330.32-0.48-0.800.100.640.23
230.6330.4329.331.101.300.201.210.041.69
232.1030.4329.331.102.771.671.212.797.68
230.1130.4329.331.100.78-0.321.210.100.61
229.6330.4329.331.100.30-0.801.210.640.09
229.6830.4329.331.100.35-0.751.210.560.12
327.1626.7729.33-2.56-2.170.396.550.154.71
326.6326.7729.33-2.56-2.70-0.146.550.027.29
325.3126.7729.33-2.56-4.02-1.466.552.1416.16
327.6626.7729.33-2.56-1.670.896.550.792.79
327.1026.7729.33-2.56-2.230.336.550.114.97
431.0330.4229.331.091.700.611.190.372.89
430.9830.4229.331.091.650.561.190.312.72
428.9530.4229.331.09-0.38-1.471.192.160.14
431.4530.4229.331.092.121.031.191.064.49
429.7030.4229.331.090.37-0.721.190.520.14
529.6729.3729.330.040.340.300.000.090.12
529.3229.3729.330.04-0.01-0.050.000.000.00
526.8729.3729.330.04-2.46-2.500.006.266.05
531.5929.3729.330.042.262.220.004.935.11
529.4129.3729.330.040.080.040.000.000.01
Sum of Squares      SS (Factor) = 45.25SS (Error) = 32.80SS (Total) = 78.05

Degrees of Freedom (DF)

Associated with each sum of squares is a quantity called degrees of freedom (DF). The degrees of freedom indicate the number of independent pieces of information used to calculate each sum of squares. For a one factor design with a factor at k levels (five lots in our example) and a total of N observations (five jars per lot for a total of 25), the degrees of freedom are as follows:

Table 5: Determining degrees of freedom

 Degrees of Freedom (DF) FormulaCalculated Degrees of Freedom 
SS (Factor)k - 15 - 1 = 4
SS (Error)N - k25 - 5 = 20
SS (Total)N - 125 - 1 = 24

Mean Squares (MS) and F Ratio

We divide each sum of squares by the corresponding degrees of freedom to obtain mean squares. When the null hypothesis is true (i.e. the means are equal), MS (Factor) and MS (Error) are both estimates of error variance and would be about the same size. Their ratio, or the F ratio, would be close to one. When the null hypothesis is not true then the MS (Factor) will be larger than MS (Error) and their ratio greater than 1. In our adhesive testing example, the computed F ratio, 6.90, presents significant evidence against the null hypothesis that the means are equal.

Table 6: Calculating mean squares and F ratio

 Sum of Squares (SS)Degrees of Freedom (DF)Mean SquaresF Ratio
SS (Factor)45.25445.25/4 = 11.3111.31/1.64 = 6.90
SS (Error)32.802032.80/20 = 1.64 

The ratio of MS(factor) to MS(error)—the F ratio—has an F distribution. The F distribution is the distribution of F values that we'd expect to observe when the null hypothesis is true (i.e. the means are equal). F distributions have different shapes based on two parameters, called the numerator and denominator degrees of freedom. For an ANOVA test, the numerator is the MS(factor), so the degrees of freedom are those associated with the MS(factor). The denominator is the MS(error), so the denominator degrees of freedom are those associated with the MS(error).

If your computed F ratio exceeds the expected value from the corresponding F distribution, then, assuming a sufficiently small p-value, you would reject the null hypothesis that the means are equal. The p-value in this case is the probability of observing a value greater than the F ratio from the F distribution when in fact the null hypothesis is true.

Figure 2: F distribution