The Empirical Rule
Defining the empirical rule
When you have normal data, the empirical rule allows you to understand it quickly. This rule is also called the “68-95-99.7% rule” or the “three sigma rule.” The rule describes the percentage of your data that is within one, two, or three standard deviations of the mean.
This is easier to understand by referring to the graph of a normal distribution in Figure 1. The center of the graph – zero on the x-axis – represents the mean of the data. The orange dotted vertical lines are drawn at one, two and three standard deviations from the mean.
Notice that about 68% of the data is within one standard deviation of the mean. Remember that the normal distribution is a theoretical population distribution. The population standard deviation uses the symbol s. Sometimes, you will see this rule written as “68% of the data is within ±s from the mean.”
Similarly, you can see that about 95% of the data falls within two standard deviations of the mean. This is often written as “95% of the data is within ±2s from the mean.”
Finally, roughly 99.7% of the data is within three standard deviations of the mean. This is often written as “99.7% of the data is within ±3s from the mean.”
In practice, you will rarely know the true population mean or population standard deviation. Instead, you will estimate using the sample mean and sample standard deviation and then use this rule.
How to use the empirical rule
How might you apply the empirical rule in analyzing your data? Assuming your data is normally distributed, the empirical rule allows you to predict the likelihood that measured outcomes will fall within certain ranges. If you find that the percentage of outcomes occurring at various standard deviations from the mean deviates from the expected percentages described by the empirical rule, you have a valuable clue that something may be amiss.
One explanation could be that you have significant outliers in your data. For example, if your data consists of measurements of a certain target specification of a manufactured item – a dimension in millimeters, for example – it may mean that your manufacturing process is poorly controlled and needs attention.
Another explanation could be that your sample, for various reasons, is a poor representation of the larger population, or that your sample size is simply too small.