Inferential Statistics

Style

section-padding-none

What are inferential statistics?

The branch of statistics known as inferential statistics or statistical inference is concerned with drawing conclusions about a population based on a sample drawn from that population. It relies on probability theory to quantify uncertainty and determine how confident we can be that the sample results reflect the true characteristics of the population.

You can follow a process for statistical analysis:

Identify the population of interest (problem definition).
Draw a representative sample from the population.
Compute sample statistics to describe the sample.
Use sample statistics to make inferences about the population.

What are some common types of statistical studies?

Several types of studies are commonly used in statistical practice. They include observational or retrospective studies, designed experiments, acceptance sampling, and process monitoring.

Observational studies try to draw inferences about the effects of treatments when the treatments or assignment of subjects might be uncontrolled. Observational studies are often based on data collected in the past, perhaps for different purposes than the current study. In these retrospective studies, there is no opportunity to specify or adjust the collection of the data. An observational study can be difficult to analyze due to deficient data, and it does not let you establish cause-and-effect relationships. For example, when analyzing data from a historical database, there might be data problems that require extensive effort to correct, or analytical problems like inseparable effects, missing important factors, or changes in the process during the observation period.

Designed and controlled experiments involve the researcher systematically changing inputs to collect data to better understand or optimize a process. Unlike observational studies, properly designed experiments allow you to establish cause-and-effect relationships.

Acceptance sampling uses controlled samples to provide enough power to detect defects. For example, in lot acceptance sampling, the goal is to determine whether to accept or reject a lot of parts based on the defects found in a sample from the lot. If there are too many defects in the sample, you might reject the entire lot. Acceptance sampling defines the sample size, sampling plan, and acceptable number of defects for the sample.

Process monitoring collects data, usually at regular intervals, to detect changes in the process mean or variability over time.

What are populations and samples?

A population is the set of all measurement values of interest. You identify your population when you define your problem or question.

A population is referred to as concrete if you can identify every unit or subject in the population. For example, at any one point in time, you can identify each person on the company payroll. These people make up a concrete population. It’s technically possible, though not always practical, to measure every individual in a concrete population. When you do measure every individual, it’s called a census.

A population is referred to as theoretical if the population is constantly changing. For example, if the population of interest is all widgets manufactured by a company, the population consists of the widgets that exist now, as well as all the widgets that the company made in the past, and all the widgets that will be made in the future. In general, it’s impossible to measure all individuals in a theoretical population.

A sample is a subset of measurement values collected from a population. To ensure that it is truly representative of the population, the sample should be selected randomly. A representative sample has characteristics that are similar to a population’s characteristics.

What is the sampling plan?

The sampling plan describes how a sample of data is drawn from a population. It is important to remember that you use the information that is contained in the sample to make conclusions about the population. The sampling plan determines the population for which you can make inferences.

One sampling method that helps ensure you have a representative sample is called simple random sampling. In a simple random sample, every member of the population has an equal chance of being included in the sample. More specifically, any sample of size n is as likely to be selected as any other sample of size n. Many statistical analyses assume that you have a simple random sample.

Here are some examples of simple random samples:

A quality team in the semiconductor industry wants to inspect wafers for defects. They randomly select 50 wafers from the 5,000 wafers that are produced in a day. Each wafer has an equal chance of being chosen.
In drug formulation development, a researcher wants to test for stability of the active pharmaceutical ingredient. She randomly selects samples from multiple batches.
An engineer wants to monitor the concentration of a chemical in a large storage tank. He uses a sampling probe to collect liquid at randomly selected depths. Randomly sampling locations ensures detection of uneven mixing.

Convenience sampling is when you select values in a population that are most easily available to you. However, convenience sampling can lead to biased samples. A biased sample is one that systematically favors certain outcomes, and therefore, it is not representative of the population from which it is drawn. In the illustration below, there is a particular characteristic that varies depending on where or when an observation is collected (the shading could represent spatial or temporal variation, for example).

Here are some examples of convenience samples and how they can bias results:

Instead of randomly selecting wafers, the quality team inspects the first 50 wafers produced in the morning run. These wafers are easy to access but might not represent the entire day’s production, especially if defects vary by time of day or equipment conditions.
The researcher uses samples only from the most recent batch. Because there is potentially some variability from batch to batch, these samples might not be representative of the population she is interested in testing.
The engineer takes samples from the most accessible location, such as dipping a container in the top of tank or using the bottom drain to collect a sample. In both cases, the samples might not represent the entire tank’s contents, especially if the liquid is layered or settling occurs.

However, convenience sampling is widely used because it can be unrealistic or very expensive to collect a simple random sample. If that’s the case, you must make the non-trivial assumption that the sample that you obtain represents the population of interest, and you must believe that the process that generates the data is stable over time.

What is the process of statistical data analysis?

There is a process involved in statistical analysis:

Identify the population of interest (problem definition).
Draw a representative sample (preferably a simple random sample).
Compute sample statistics to describe the sample.
Use sample information to make inferences about the population.

Descriptive statistics (for example, the sample mean or standard deviation) are used to organize, summarize, and focus on the main characteristics of your sample (the data that you collected). Summarizing your data in such a manner also makes the data more usable. Rather than dealing with individual data points, you want to reduce the data to single values that capture essential characteristics. Statistics capture important qualities such as the location, variation, shape, and frequency of values.

Inferential statistics allow you to make statements about the true characteristics of a population (such as the true population mean) based on your sample statistics and statistical theory. Inferential statistical methods are used to:

Calculate confidence intervals.
Test statistical hypotheses.
Fit statistical models.

When you finish the process, you might want to repeat it with new data or with a new question.

layout

2 column

Style

columns-75-25, section-top-padding-xsmall