Geometric Mean

What is the geometric mean?

The geometric mean of a set of numeric data values is the nth root of the product of the data. The sample geometric mean is an estimate of the center of the data distribution. It is more useful than the sample mean when data represent rates or percentages.

How do you calculate the geometric mean?

To calculate the geometric mean, multiply all the data values in your sample and take the nth root, where n is the number of observations you have. Let’s start with a simple example of just two numbers, say 4 and 9. The geometric mean is $\sqrt{4 \times 9} = \sqrt{36} = 6$.

The reason it’s called the geometric mean is that it has a geometric interpretation. A rectangle with side lengths 4 and 9 produces the same area as a square with side length 6.

With three numbers, the geometric mean tells you the size of a cube that has the same volume as a rectangular box with side lengths the same as the data.

It means that the geometric mean is the number that when multiplied by itself n times has the same product as the product of your sample. Similarly, the arithmetic mean is the number that when added to itself n times has the same sum as your sample.

Since the formula involves multiplication, the geometric mean is often used to summarize compounding rates.

For example, suppose your process involves growing a batch of cells through five treatments. At the end of each treatment step, you measure the growth rate from the beginning of the treatment. Suppose one batch of cells has growth rate values of 18%, 84%, 15%, 63%, and 21% over the five treatments. A growth rate of 18% means that the cells multiplied 1.18 times. The geometric mean is $\sqrt[5]{1.18 \times 1.84 \times 1.15 \times 1.63 \times 1.21} = 1.375$. The average growth rate per step is therefore 37.5%. This is the constant growth rate to obtain the same total number of cells over all five steps.

The arithmetic mean is $\frac{18\% + 84\% + 15\% + 63\% + 21\%}{5} = 40.2\%$. If you use the arithmetic mean instead of the geometric mean, you will think that you have more total cells in the final batch than you really do. Let’s see what that means.

Suppose there are 10,000 cells in the batch at the beginning of the run. Using the geometric mean, the final number of cells is 10,000 $\times$ (1.375)5 = 49,186, which is the same as 10,000 $\times$ 1.18 $\times$ 1.84 $\times$ 1.15 $\times$ 1.63 $\times$ 1.21. Using the arithmetic mean, 10,000 $\times$ (1.402)5 = 54,300. You are overestimating the final number of cells by using the arithmetic mean.

What are some properties of the geometric mean?

The geometric mean has the following properties:

Should I use the mean or the geometric mean?

The mean is an appropriate measure of central tendency for continuous measurements that are quantities or anything measured on a linear scale. The geometric mean is an appropriate measure of central tendency for rates, proportions, ratios, percentages, frequencies, or anything measured on a logarithmic scale* or data measured over multiple orders of magnitude. It is used when effects are compounding (see the cell growth example above).

*If you choose to take the log of the measurements, use the arithmetic mean to summarize them. If you choose not to take the log, use the geometric mean on the raw data.

Can I use the geometric mean for categorical variables?

No, calculating the geometric mean requires variables to be measured on a continuous scale. Categorical variables are measured on a scale with finite values, usually just a small number of possibilities.

Examples of the mean

If you have JMP on your computer, you can download the JMP data set Univariate Statistics Data.jmp for your own analysis. (If you don't have access to JMP, download a free trial here.)

Examine the geometric mean for the data in the Univariate Statistics Data. You can compare the mean to the geometric mean. The geometric mean is always less than the arithmetic mean. The geometric mean is missing if the data contain negative numbers.