Median
What is the median?
The median of a set of numeric data values is middle of those values. It represents the 50th percentile and the second quartile of the data. Half the data are above and half the data are below the median. It is used instead of the mean to describe the center of the data when the data are skewed or have other unusual shapes.
How do you calculate the median?
To calculate the median, order the values. If there is an odd number of values, the median is the value in the middle of the ordered values. If there is an even number of values, the median is the average of the two values in the middle of the ordered values.
For example, if you are measuring the height of basketball players in inches, and your measurements are 63, 69, 72, 73, and 74 inches, then the median is 72 inches. If there were six players with heights 63, 69, 72, 73, 74, and 84 inches, the median is (72 + 73) / 2 = 72.5 inches.
What are some properties of the median?
The sample median has these properties:
- It is a point where 50% of the data is at least that point and 50% of the data is at most that point. There actually could be many medians in a data set. For example, if the number of data values is even, any number between the two middle numbers could be a median. It is only by convention that we typically take the average of the two middle values.
- The median is an important sample quantile. It is the second quartile and 50th percentile.
- It is the least absolute value estimate. The sum of absolute deviations of the values from the median is minimized. This sum is less than would be computed from any other point other than the sample median. Least absolute value estimators are also called L1 estimators or minimum absolute deviation (MAD) estimators.
- The median is the 50%-trimmed mean.
Should I use the mean or the median?
The mean measures the average value of a data set. In the basketball player height example, the mean weights each of the five values equally. The median considers all the data but only uses the value of the middle of the data. If the height values are 63, 69, 72, 73, and 74, the mean is 70.2 and the median is 72. Even if we replace the 74” player with an 84” player, the median remains at 72 and the mean is now 72.2.
For skewed data or data with extreme values, the median is often a more representative value of the center of the data.
Can I use the median for categorical variables?
You can use the median for ordinal data but not for nominal data. Ordinal variables are categorical variables with order to the categories, like the size of popcorn you buy at the movie theater: small, medium, or large. Nominal variables are categorical variables with no inherent order to the categories, like the top 10 grossing movies over the weekend.
Suppose you and five friends go to the movies, and each of you buys popcorn. The sizes of popcorn for the six of you are large, large, medium, small, large, large. The ordered values are small, medium, large, large, large, large. The median value is large.
If one friend had ordered medium instead of large, the median would be halfway between medium and large.
Examples of the median
Examine the median for the data in the Univariate Statistics Data. Is it the value that you would have guessed for the center based on the histogram of the data?