Kurtosis

What is kurtosis?

Kurtosis is a statistic used to describe the shape of a distribution of continuous data. It measures the tendency of the data to be distributed toward the tails, or ends, of the distribution versus toward the middle. If your distribution has heavier tails compared to the normal distribution, the kurtosis is positive. If your distribution has lighter tails than a normal distribution, the kurtosis is negative. Kurtosis does not measure shape in the center of the distribution, or how peaked a distribution is, it only measures how heavy the tails are compared to a normal distribution’s tails.

When should you use kurtosis?

Kurtosis measures the tail-heaviness of a set of data. This measure can be particularly useful when you expect a normal distribution, but your data look symmetric but not normal when visualizing with a histogram or box plot.

Kurtosis can be calculated for continuous or count (categorical numeric) data.

A distribution with positive kurtosis is called leptokurtic, since “lepto” means slender, referring to the peak. A distribution with negative kurtosis is called platykurtic, since “platy” means broad, again, referring to the peak. These terms mistakenly put the focus of kurtosis on the peak of the distribution instead of the tails. In fact, kurtosis does not measure the shape of the distribution in the middle of the distribution; it only measures the tail-heaviness of a distribution.

It is difficult to visualize kurtosis using the density function because, in general, tails of a distribution have lower probability than the center. It is easier to visualize kurtosis using a normal quantile plot. This graph plots the quantiles of a distribution by the quantiles from a corresponding normal distribution. If the distribution is normal, the plot will display a line as in the left figure below. If the distribution has heavier tails than a normal distribution (kurtosis > 0), the plot will have an inverted s-shape, resembling the center figure below. If the distribution has lighter tails than a normal distribution (kurtosis < 0), the plot will be s-shaped, resembling the right figure below. The red line in the figures represents the line y = x. It is obscured in the normal distribution figure on the left.

It is important to note that data points in the tails of a distribution are not necessarily outliers. Outliers are observations that do not belong to the distribution; they belong to another distribution. Tail points, by contrast, are unlikely values that do belong to the distribution.

Many hypothesis tests with an assumption of normality are robust to departures from normality as long as the distribution is symmetric. To determine if a kurtosis value is of concern, you can create a bootstrap confidence interval on the kurtosis statistic and see if it contains zero.

Kurtosis is the fourth central moment about the mean. In the calculation of the fourth central moment, the centered data are raised to the fourth power. A value near the center of the distribution will have a small fourth power compared to a value far from the mean. Because of this, kurtosis measures tail-heaviness, not peakedness, of a distribution. Statistical software, like JMP, often reports the kurtosis excess, which is defined as kurtosis minus 3 (with a correction for small sample sizes). A normal distribution has a kurtosis of 3 and kurtosis excess of zero.

How do you calculate kurtosis?

The formula for kurtosis is complicated and not generally calculated by hand. Instead, use statistical software, as displayed below.

Examples of kurtosis

If you have JMP on your computer, you can download the JMP data set Univariate Statistics Data.jmp for your own analysis. (If you don't have access to JMP, download a free trial here.)

Examine the kurtosis for the data in the Univariate Statistics Data. You can visualize kurtosis using the box plot and histogram and see the value of kurtosis in the Summary Statistics report.

Compare the length of the box to the length of the whiskers of the box plot. Which is longer? Compare the amount of data in the tails to the amount of data in the middle of the histogram. Which has more data?