Mean
What is the mean?
The mean of a set of numeric data values is the average of those values. The sample mean is an estimator of the population mean. It is sometimes called the arithmetic mean.
How do you calculate the mean?
To calculate the mean, add up all the data values in your sample and divide by how many data values you have. For example, let’s say you are measuring the height of basketball players in inches, and your measurements are 63, 69, 72, 73, and 74 inches.
For the basketball players, the sum of the heights is 63 + 69 + 72 + 73 + 74 = 351, and the mean is 351/5 = 70.2 inches.
We use the symbol $ \bar{x} $ to indicate the sample mean. Using mathematical notation, we calculate the sample mean using the formula $ \bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n} = \frac{1}{n} \sum_{i=1}^{n} x_i $ . Here, n represents the sample size (five, in our sample above), the summation sign $ \sum $ indicates you will add together what follows from the start of the index (i = 1 below the summation sign) to the end (n above the summation sign), and $ x_i $ represents the data values to be summed.
What are some properties of the mean?
The sample mean has these properties:
- It is the balance point. The sum of deviations of each sample value from the sample mean is zero. The sample mean is the first moment.
- It is the least-squares estimate. The sum of squared deviations of the values from the mean is minimized. This sum is less than would be computed from any other estimate of the population mean other than the sample mean.
- It is the maximum likelihood estimator of the population mean when the data are drawn from the normal distribution. It is the estimate that makes the data that you collected more likely than any other estimate of the true mean would.
How do extreme values affect the mean?
Each data value is equally weighted in the calculation of the sample mean. Therefore, an extreme value can greatly affect the sample mean.
Consider the basketball player heights again. In our original example the mean of 63, 69, 72, 73, and 74 is 70.2. Suppose instead that our tallest player was 7 feet tall: 84 inches instead of 74 inches. In this case, the mean of 63, 69, 72, 73, and 84 is 72.2 inches. The extreme value is greatly affecting the mean.
Does this mean you should remove extreme values before calculating the mean? No! Extreme values are part of the data. The key is to determine whether or not they actually belong to the distribution you are interested in measuring, that is, whether they are outliers. If you determine they do not belong, then remove them. You may decide to report the mean with and without the extreme values. You can also consider using other measures of central tendency.
Should I use the mean or the median?
The mean measures the average value of a data set. In the basketball player height example, the mean weights each of the five values equally. The median considers all the data but only uses the value of the middle of the data. If the height values are 63, 69, 72, 73, and 74, the mean is 70.2 and the median is 72. Even if we replace the 74” player with an 84” player, the median remains at 72 and the mean is now 72.2.
The mean is often used to describe the center of a symmetric distribution. The median is often used for a skewed distribution.
Can I use the mean for categorical variables?
No, calculating the mean requires variables to be measured on a continuous scale. Categorical variables are measured on a scale with finite values, usually just a small number of possibilities.
Examples of the mean
Examine the mean for the data in the Univariate Statistics Data. Is it the value that you would have guessed for the center based on the histogram of the data?