Statistical Inference
What is statistical inference?
Statistical inference involves making decisions about a population based on data from a sample.
Inferential vs descriptive statistics
Inferential statistics are used to make a decision or draw conclusions about a population based on sample data.
Descriptive statistics are used to summarize a set of data. The goal is simply to summarize the data you have.
For example, suppose a clinic collects vital signs on patients, such as temperature, blood pressure, and heart rate. The clinic uses descriptive statistics to summarize the vital signs for all of their patients with recorded vital sign measurements. The clinic uses inferential statistics to use the data on current patients to build an estimate for the whole population of patients, including those who have not had vital signs recorded. If the clinic wants to use the data to make a decision or draw conclusions about the whole population, then the clinic plans to do statistical inference.
Planned and exploratory analyses
Planned analyses involve four primary steps:
- A question or hypothesis is defined. Sometimes this is called a study endpoint.
- An experiment or data collection plan is defined, which includes estimating a sufficient number of data points to collect.
- An analysis plan is defined. This is sometimes called a statistical analysis plan (SAP). The statistical methods you plan to use are defined.
- Data are collected, analyzed, and used to draw conclusions.
For example, clinical trials for new drugs use a designed experiment to assign different drugs to patients. The goal of the planned analysis is to use the data to compare the effectiveness of each drug.
Sometimes you have an idea about the population that you want to test, so you define an analysis and then collect a random sample of data. For example, you might want to test to what extent airline passengers shop in the airport. You plan to estimate the proportion of passengers that shop at the airport. To do so, you collect a random sample of airline passengers. You plan to use the data from this sample to make a decision about the whole population.
Exploratory analyses are often done as a first step in learning about the data. Exploratory data analyses (EDA) use graphs and descriptive statistics. You can use EDA to check data for errors in planned analyses. EDA does not use hypothesis tests, but it can help develop ideas for testing later with a different set of data.
With EDA, you look at the shape, center, and spread of the distribution of data. You also look for unusual values or outliers. You look to see if some variables might have a possible impact on the data. For example, you might graph property values for rural and urban homeowners separately to see if the location has an impact on property value.
CAUTION! Some statisticians will warn you not to use your data to "snoop around" and get ideas for testing. Inferential tests are usually built on the assumption that you are setting up the test hypotheses before, and independently of, collecting the data. The power of those tests assumes that you didn't already see a pattern which prompted you to decide to do that test. Violating that assumption will invalidate the power calculations and the Type I and Type II error rates for the tests.
Most statisticians will advise that you explore your data for unusual values and to describe the distributions, but NOT to help inform the inferential tests you will use.
Overview of analyses
The table below lists common analyses for inferential statistics and provides links to other topics in SKP.
Analysis
Description
Example
Test if the means for groups are equal or not.
Groups are formed by the combinations of values of two categorical or nominal variables.
Test if the means for groups are equal or not.
Groups are formed by the combinations of values of categorical or nominal variables.
Test if the means for groups are equal or not, taking into account the impact of a continuous variable (called a covariate).
Groups are formed by the combinations of values of categorical or nominal variables.
Test if the means for groups are equal or not, taking into account the change in response over time.
Groups are formed by the combinations of values of categorical or nominal variables.
Test if the means for groups are equal or not; allow for interactions between variables.
Groups are formed by the combinations of values of categorical or nominal variables.