Skewness

What is skewness?

Skewness is a statistic used to describe the shape of a continuous variable. It measures the tendency of the distribution to be more spread out on one side or the other. A distribution that is approximately symmetric has a skewness statistic close to zero. If your distribution is more spread out on the left side (with data values piled up on the right), the skewness is negative. If your distribution is more spread out on the right side (with data values piled up on the left), the skewness is positive.

When should you use skewness?

Skewness is used to measure the symmetry of a set of data. This measure is especially useful if you expect symmetry, but your data do not look symmetric when visualizing with a histogram or box plot.

It is often true that for data that are positively skewed, the mean is larger than the median, and for data that are negatively skewed, the mean is smaller than the median. However, this relationship is not always true, especially in the case of multimodal data or where one distribution tail is much heavier than the other.

To determine if a skewness value is of concern, you can create a bootstrap confidence interval on the skewness statistic and see if it contains zero.

Skewness can be calculated for continuous data or count (numeric categorical) data.

Skewness is the third central moment about the mean.

How do you calculate skewness?

The formula for skewness is complicated and is not typically calculated by hand. Instead, skewness is calculated using statistical software, as displayed below.

Examples of skewness

If you have JMP on your computer, you can download the JMP data set Univariate Statistics Data.jmp for your own analysis. (If you don't have access to JMP, download a free trial here.)

Examine the skewness for the data in the Univariate Statistics Data. You can visualize the skewness using the histogram and box plot and see the value of skewness in the Summary Statistics report.

Compare the left tail of the histogram to the right tail. Do the data pile up on one side or the other or in the middle? Is there more than one place where the data pile up? Compare the size of the left and right whiskers of the box plot. Is one longer than the other or are they about the same length?