Alpha-Trimmed Mean

What is an $\alpha$-trimmed mean?

An $\alpha$-trimmed mean of a set of numeric data values is the mean of the data with the top and bottom $\alpha$% removed. It is a robust estimator of a population mean, because extreme values are removed in the trimming.

An example of a trimmed mean is in figure skating competitions, where the highest and lowest judges’ scores are removed before computing the average score.

Trimmed means are not commonly used in routine data analysis. However, the median is the 50% trimmed mean, so you can think of a trimmed mean as a tradeoff between the sample mean and sample median.

When should you use an $\alpha$-trimmed mean?

An $\alpha$-trimmed mean is typically used when you want to remove the effect of extreme values from the calculation of the mean. It is the mean of the middle (100 - 2$\alpha$)% of the data. You can compare the mean and the $\alpha$-trimmed mean for various values of $\alpha$ to find one that represents the center of your distribution.

Trimming reduces the impact of extremely small or extremely large data values. In the presence of outliers, it can produce a more reliable estimate of the population mean than the sample mean.

How do you calculate an $\alpha$-trimmed mean?

To calculate an $\alpha$-trimmed mean, order the data and exclude the top and bottom $\alpha$% of the data. For example, suppose the heights in inches of players on a basketball team are 63, 69, 72, 73, and 84. The 20% trimmed mean is found by removing the highest and lowest 20% of the values, in this case, removing the minimum and the maximum, then averaging the three middle values.

$20\%\text{-trimmed mean} = \frac{69 + 72 + 73}{3} = 71.33$

The definition of the trimmed mean is to remove the specified $\alpha$% from the top and the bottom of the data for a total of 2 $\times$ $\alpha$% removed. Some spreadsheet software packages remove a total of $\alpha$%. Be sure to understand formulas used in software before using them.

Should I use the mean or an $\alpha$-trimmed mean?

The mean measures the average value in a data set, using all the data, which means it can be pulled in one direction or another by extreme values or outliers in the data. If you know your data contain extreme values or outliers, consider using an $\alpha$-trimmed mean to summarize the center of the data instead.

Similarly, if you know the distribution of the data is heavy-tailed, an $\alpha$-trimmed mean can be a good choice for measuring the typical value of the data, since it is minimally affected by points in the tails.

How much data should I trim?

Determining how much data to trim depends on how many extreme values or outliers there are in your data or how heavy the tails of the distribution are. For a process with many extreme values, you might choose to remove as much as 50% of the data (25%-trimmed mean). For a process with few extreme values, you might choose to remove as little as 10% of the data (5%-trimmed mean). There is not a formula to tell you the “right” value of $\alpha$. The median is the 50%-trimmed mean.

Can I use an $\alpha$-trimmed mean for categorical variables?

No, calculating the mean, and therefore, the trimmed mean, requires variables to be measured on a continuous scale. Categorical variables are measured on a scale with finite values, usually just a small number of possibilities.

Examples of an $\alpha$-trimmed mean

In the figure below, a histogram and summary statistics are shown for different samples from a heavy-tailed distribution coming from a population with a true center of 200. When there are extreme values in one tail, the sample mean is pulled towards that tail. The 10%-trimmed mean is more representative of the center of the data.

If you have JMP on your computer, you can download the JMP data set Univariate Statistics Data.jmp for your own analysis. (If you don't have access to JMP, download a free trial here.)

Examine the 20%-trimmed mean for the data in the Univariate Statistics Data. You can compare the trimmed mean to the mean and median.