Descriptive Statistics
What are descriptive statistics?
There are two main branches of statistics: descriptive statistics and inferential statistics. The branch known as descriptive statistics is concerned with summarizing data in a meaningful way and describing its main features. Descriptive statistics include measures of central tendency (such as the mean or median) and measures of variability (such as the standard deviation or quantiles). Histograms and box plots are examples of graphs that can be used to display these statistics.
What’s the difference between descriptive statistics and inferential statistics?
Descriptive statistics are often calculated for a sample of data drawn from a population, and are used to describe key features of the sample. The population (illustrated on the left) includes all of the individuals or measurement values of interest. A sample (illustrated on the right) is a subset of the population.
In contrast to inferential statistical methods, which use probability and statistical models to draw conclusions about a larger population based on data from a sample, descriptive statistics do not attempt to generalize beyond the data in the sample. These sample statistics provide a summary of the information contained within the sample itself, which is then the basis for making inferences about the characteristics of the population.
What are examples of descriptive statistics?
The table below lists common descriptive statistics and brief definitions. Measures of central tendency are statistics that describe a “typical” value around which the data points tend to cluster, such as the mean, median, mode, and geometric mean. Measures of variability are statistics that describe the extent to which the data points tend to deviate from, or spread around, the central tendency. The standard deviation, variance, quantiles, and the interquartile range (IQR) are examples of measures of variability. Moments are measures that describe key characteristics of a data distribution, like the center, spread, and shape. They include the mean, variance, skewness, and kurtosis.
| Term | Definition |
| Mean | The arithmetic average of the data |
| Median | The middle value in the data when the data are ordered |
| Mode | The value occurring most frequently in the data |
| Geometric mean | The geometric average of the data |
| Standard deviation | The spread of the data values around the mean, expressed on the same scale as the data |
| Variance | The spread of the data values around the mean, expressed on a squared scale |
| Quantiles | Values in a data set where a given proportion of the observations fall at or below that value |
| Percentiles | Specialized quantiles that divide the data into hundredths |
| Quartiles | Specialized quantiles that divide the data into quarters |
| Interquartile range (IQR) | The difference between the third and first quartiles (75th and 25th percentiles); the middle 50% of the data |
| Skewness | The departure from symmetry of a data distribution |
| Kurtosis | How “heavy” or “light” the tails of a data distribution are |