Quantiles

What are quantiles?

The quantiles of a distribution are points at which a specified percentage of the data lies below. For example, the median is the 50% quantile because 50% of the data lie below it. Other useful quantiles are quartiles (which split the data into four equal parts), quintiles (which split the data into five parts), and percentiles (which split the data into 100 parts). The 0% quantile of a sample of data is the minimum value, and the 100% quantile is the maximum value.

When should you use quantiles?

You are probably already familiar with quantiles from everyday contexts. For example, if you have children, you’ve undoubtedly seen growth charts that plot, for example, the 5%, 10%, 25%, 50%, 75%, 90%, and 95% quantiles for both weight and height by age. Another example is the published quantiles for a college entrance exam. For each score, the corresponding percentage is displayed. These are examples of population quantiles, which describe the entire population.

You can also find population quantiles for race times published on running websites. For example, say that 10% of runners complete a 10K run in about 1 hour, 7 minutes, this value is the 10% population quantile. Similarly, if you run a race with 100 contestants and your rank is fifth place, with a time of 1 hour, 10 minutes, your time is the 20% quantile of the sample of 100 contestants.

Sample quantiles

Sample quantiles are used to estimate population quantiles. For example, say you wanted to estimate the score in which 80% of the students had a lower score and 20% had a higher score on a college entrance exam. This is the 80% quantile of the scores. You have score data from the students at your school who took the exam. The 80% quantile of their tests scores is an estimate of the 80% quantile of all students who took the exam. This estimate will be a good estimate if the sample of students from your school is representative of the population of all students who took the exam. The figure below shows the distribution of test scores with the 80% quantile marked.

Population quantiles

There are other ways to estimate a population quantile. For example, if you can assume a normal distribution for the scores, you can estimate the population quantile using the 80% quantile of the fitted normal distribution.

Quantiles in hypothesis testing

Quantiles of reference distributions are used as critical values in hypothesis testing. You can reject a null hypothesis if the p-value is less than your significance level $\alpha$.

Equivalently, the null hypothesis can be rejected if the test statistic is greater than the 100 $\times (1 - \alpha)\%$ quantile of the reference distribution.

Suppose you want to test the hypothesis that the population mean of the test scores is 85, based on this sample of n = 18 test scores, at the 0.05 level of significance. The alternative hypothesis is that the mean is not 85. You perform a one-samplet-test. The test statistic is $\frac{\bar{x} - 85}{s \,/\, \sqrt{n}} = \frac{81.6667 - 85}{6.24971 \,/\, \sqrt{18}} = -2.2628$. The 2.5% quantile of the t(17) distribution is –2.1098.

Since the test statistic is beyond the t quantile, you reject the null hypothesis. The figure below shows the value of the test statistic and the 2.5% quantile on a graph of the t(17) density function.

Quantiles on a box plot

Some important sample quantiles are also shown on the box plot. The five-number summary of the data is the set of the 0%, 25%, 50%, 75%, and 100% quantiles, shown on the box plot. A quantile box plot also shows other quantiles: 2.5%, 10%, 90%, and 97.5%. You can see a box plot, quantile box plot, histogram, and table of some quantiles for data from a bimodal distribution below.

Quantiles in fitting distributions

To test if your data come from a normal distribution, you can plot the data on the Y axis and the quantiles from the fitted normal distribution on the X axis, or the other way around. The fitted normal distribution is the normal distribution with the same mean and variance as the data. If the data values do not follow the line y = x, there is evidence that the data are not normally distributed. You can use quantile plots with other distributions, not just the normal distribution. You can see a normal quantile plot for random data generated from a normal distribution and a skewed distribution below. The data are on the X axis, the quantiles from a normal distribution are on the Y axis. The corresponding probabilities are also displayed on the Y axis.

Quantiles in contour plots

When using contour plots or density estimation plots, it can be useful to change contour levels at sample quantiles. For example, in the graph below, two variables in cytometry have been plotted against each other. A kernel density estimator has been fit, and the 5% quantiles of that density estimator are displayed. The data are most dense in the three regions where the contours are colored red or orange; they are least dense where the contours are colored blue or purple.

How do you calculate quantiles?

To compute the pth quantile of a sample, arrange the n values of the sample in order from smallest to largest. Denote these values $x_{(1)}, x_{(2)}, \ldots, x_{(n)}$. Next, compute the rank number of the pth quantile as $r = \frac{(n+1)p}{100}$.

Find the quantiles as follows:

As an example, consider the heights in inches of seven basketball players.

74, 75, 72, 73, 73, 88, 74

First, order the data.

72, 73, 73, 74, 74, 75, 88

Next, find the percentages corresponding to each ordered value as $p = \frac{r(100)}{n+1} = \frac{100r}{8}$.

12.5%, 25%, 37.5%, 50%, 62.5%, 75%, 87.5%

Statistical software, like JMP, can calculate quantiles. Different software might use different formulas than these. Be sure to check the software’s documentation to find which formula is used.

Examples of quantiles

If you have JMP on your computer, you can download the JMP data sets Cereal.jmp and Univariate Statistics Data.jmp for your own analysis. (If you don't have access to JMP, download a free trial here.)

For the cereal calories data, you can see some common quantiles in the figure below.

Let’s use the formula to verify some of these values: the median, the 75% quantile, and the 0.5% quantile.

There are 76 observations in the data set. The median, or 50% quantile, has rank (76 + 1) (50) / 100 = 38.5. Remember that the rank is the position in 1, …, n of an observation after you have ordered the data. Since this rank is not an integer, linearly interpolate between the 38th and 39th observations. Both values are 120, so their average is 120 and the median is 120.

The 75% quantile is the third quartile. It has rank (77) (75) / 100 = 57.75. The 57th observation is 190, the 58th observation is 200. The 75th quantile is (0.25) (190) + (0.75) (200) = 197.5.

The 0.5% quantile has rank (77) (0.5) / 100 = 0.385 < 1, so the 0.5th quantile is the minimum value, 50.

As another example, let’s examine the quantiles for the data in the Univariate Statistics Data. You can visualize the quantiles using the quantile box plot and see the value of some quantiles in the Quantiles report.