Hypothesis Testing
What is statistical hypothesis testing?
Hypothesis testing is the cornerstone of statistical inference. Statistical hypothesis testing involves making a decision about two competing hypotheses. The null hypothesis ($H_0$) is a statement about the assumed value of a population parameter. It is usually a hypothesis about no difference or no relationship. The alternative hypothesis ($H_1$) is a statement about the value of a population parameter that you want to test. It is usually a hypothesis that has some difference or some relationship. Researchers often define the alternative hypothesis first and state the null as its logical opposite.
What are the steps in hypothesis testing?
The steps to perform a hypothesis test are:
- Determine the null ($H_0$) and alternative ($H_1$) hypotheses. The null hypothesis is assumed to be true when you start your analysis. It is the logical opposite of your suspicion.
- Select a significance level. The significance level is the amount of evidence needed to overturn your assumption that the null hypothesis is true.
- Collect evidence (data).
- Use a decision rule to make a judgment. If the evidence in the data is sufficiently strong, based on the selected significance level, then reject the null hypothesis. If the evidence in the data is not strong enough, fail to reject the null hypothesis. It is important to note, however, that failing to reject the null hypothesis does not prove the alternative hypothesis.
Hypothesis testing analogy
An analogy might be useful for illustrating the steps of a hypothesis test. Let’s imagine a scenario where a defendant has been charged with a crime and is on trial in a court of law.
- The null hypothesis ($H_0$) is that the defendant is not guilty; this is assumed to be true at the outset of the trial. The alternative hypothesis ($H_1$) is that the defendant is guilty.
- How much evidence do you require (the significance level) to reject your assumption that the defendant is not guilty? In this analogy, the criterion is evidence beyond a reasonable doubt.
- Collect the evidence.
- Weigh the evidence against the significance level to make your decision. If there is evidence beyond a reasonable doubt, reject the not guilty hypothesis ($H_0$); the evidence supports a conclusion of guilty ($H_1$). If there is no evidence beyond a reasonable doubt, you fail to reject the not guilty hypothesis ($H_0$).
Of course, failing to reject the not guilty hypothesis doesn’t mean that the defendant has been proven innocent – it simply means that there was not enough evidence to conclude guilt. And rejecting the not guilty hypothesis doesn’t guarantee that the defendant really is guilty. There is always a risk of making an error when you make a decision. In statistical hypothesis testing, these are known as Type I and Type II errors.
What are Type I and Type II errors?
You perform a hypothesis test and make a decision, but was the decision correct? In reality, only one of the two hypotheses can be true.
If the null hypothesis is true and you fail to reject it, you have made a correct decision. If the null hypothesis is true and you reject it, you have made a Type I error. You control the probability of making a Type I error through the significance level $\alpha$.
If the null hypothesis is false and you reject it, you have made a correct decision. If the null hypothesis is false and you fail to reject it, you have made a Type II error. The probability of making a Type II error is $\beta$.
The power of the test is 1 – $\beta$, which is the probability of correctly rejecting the null hypothesis when it is false. In other words, it is the probability of finding a real difference or a real relationship in the data. You can often increase the power of your test by increasing the sample size of your data.
What is a reference distribution?
A reference distribution is the probability distribution of the test statistic assuming the null hypothesis is true. It is used to calculate p-values.
Common reference distributions in hypothesis testing are the t distribution and the F distribution. Examples of these distributions are shown below.
What is a p-value?
A reference distribution enables you to quantify the probability of observing a particular outcome (the calculated test statistic) or a more extreme outcome if the null hypothesis is true. That probability is called the p-value.
A large p-value indicates a high probability of observing your results or more extreme results, given that $H_0$ is true. Therefore, it is reasonable to continue to assume $H_0$ is true, and you fail to reject the null hypothesis. A small p-value indicates a low probability of observing your results or more extreme results, given that $H_0$ is true. Therefore, it is no longer reasonable to assume that $H_0$ is true, and you reject the null hypothesis.
The p-value is a number between zero and one, inclusive. It is a probability that is calculated from your data.
Summary of statistical hypothesis tests
In a statistical hypothesis test, the following conditions exist:
-
The alternative hypothesis is your suspicion. The null hypothesis is the logical opposite of your suspicion. Assume that the null hypothesis is true.
-
The significance level is denoted by $\alpha$, the probability of a Type I error.
-
The strength of the evidence for the null hypothesis is measured by a p-value.
-
The decision rule expresses the following results:
· Rejects the null hypothesis if the p-value is less than $\alpha$ (in other words, a rare outcome).
· Fails to reject the null hypothesis if the p-value is greater than or equal to $\alpha$.
What are some common hypothesis tests?
| Hypothesis test | Null hypothesis |
| One-sample ttest | H0: μ1 = c |
| Two-sample ttest | H0: μ1 = μ2 |
| One-way ANOVA | H0: μ1 = μ2 = … = μk |
| Simple linear regression | H0: β1 = 0 |
| Multiple linear regression | H0: β1 = β1 = … βk = 0 |
| Equal variances | H0: σ12 = σ22 = … = σk2 |
| Correlation | H0: ρ = 0 |