The One-Sample $t$-Test

Style

section-padding-none

What is the one-sample $t$-test?

The one-sample t-test is a statistical hypothesis test used to determine whether an unknown population mean is different from a specific value.

When can I use the test?

You can use the test for continuous data. Your data should be a random sample from a normal population.

What if my data isn’t nearly normally distributed?

If your sample sizes are very small, you might not be able to test for normality. You might need to rely on your understanding of the data. When you cannot safely assume normality, you can perform a nonparametric test that doesn’t assume normality.

Using the one-sample t-test

left

blue

The sections below discuss what we need for the test, checking our data, performing the test, understanding test results and statistical details.

What do we need?

For the one-sample t-test, we need one variable.

We also have an idea, or hypothesis, that the mean of the population has some value. Here are two examples:

A hospital has a random sample of cholesterol measurements for men. These patients were seen for issues other than cholesterol. They were not taking any medications for high cholesterol. The hospital wants to know if the unknown mean cholesterol for patients is different from a goal level of 200 mg.
We measure the grams of protein for a sample of energy bars. The label claims that the bars have 20 grams of protein. We want to know if the labels are correct or not.

One-sample t-test assumptions

For a valid test, we need data values that are:

Independent (values are not related to one another).
Continuous.
Obtained via a simple random sample from the population.

Also, the population is assumed to be normally distributed.

See how to perform a one-sample $t$-test using statistical software

https://share.vidyard.com/watch/SFz5v4tJ19H7zNBvhcdxEW

Download JMP to follow along using the sample data included with the software.
To see more JMP tutorials, visit the JMP Learning Library.

One-sample t-test example

Imagine we have collected a random sample of 31 energy bars from a number of different stores to represent the population of energy bars available to the general consumer. The labels on the bars claim that each bar contains 20 grams of protein.

Table 1: Grams of protein in random sample of energy bars

Energy Bar - Grams of Protein
20.69	27.46	22.15	19.85	21.29	24.75
20.75	22.91	25.34	20.33	21.54	21.08
22.14	19.56	21.10	18.04	24.12	19.95
19.72	18.28	16.26	17.46	20.53	22.12
25.06	22.44	19.08	19.88	21.39	22.33	25.79

If you look at the table above, you see that some bars have less than 20 grams of protein. Other bars have more. You might think that the data support the idea that the labels are correct. Others might disagree. The statistical test provides a sound method to make a decision, so that everyone makes the same decision on the same set of data values.

Checking the data

Let’s start by answering: Is the t-test an appropriate method to test that the energy bars have 20 grams of protein ? The list below checks the requirements for the test.

The data values are independent. The grams of protein in one energy bar do not depend on the grams in any other energy bar. An example of dependent values would be if you collected energy bars from a single production lot. A sample from a single lot is representative of that lot, not energy bars in general.
The data values are grams of protein. The measurements are continuous.
We assume the energy bars are a simple random sample from the population of energy bars available to the general consumer (i.e., a mix of lots of bars).
We assume the population from which we are collecting our sample is normally distributed, and for large samples, we can check this assumption.

We decide that the t-test is an appropriate method.

Before jumping into analysis, we should take a quick look at the data. The figure below shows a histogram and summary statistics for the energy bars.

Figure 1: Histogram and summary statistics for the grams of protein in energy bars

From a quick look at the histogram, we see that there are no unusual points, or outliers. The data look roughly bell-shaped, so our assumption of a normal distribution seems reasonable.

From a quick look at the statistics, we see that the average is 21.40, above 20. Does this average from our sample of 31 bars invalidate the label's claim of 20 grams of protein for the unknown entire population mean? Or not?

How to perform the one-sample $t$-test

For the t-test calculations we need the mean, standard deviation and sample size. These are shown in the summary statistics section of Figure 1 above.

We round the statistics to two decimal places. Software will show more decimal places, and use them in calculations. (Note that Table 1 shows only two decimal places; the actual data used to calculate the summary statistics has more.)

We start by finding the difference between the sample mean and 20:

$ 21.40-20\ =\ 1.40$

Next, we calculate the standard error for the mean. The calculation is:

Standard Error for the mean = $ \frac{s}{\sqrt{n}}= \frac{2.54}{\sqrt{31}}=0.456 $

This matches the value in Figure 1 above.

We now have the pieces for our test statistic. We calculate our test statistic as:

$ t = \frac{\text{Difference}}{\text{Standard Error}}= \frac{1.40}{0.456}=3.07 $

To make our decision, we compare the test statistic to a value from the t-distribution. This activity involves four steps.

We calculate a test statistic. Our test statistic is 3.07.
We decide on the risk we are willing to take for declaring a difference when there is not a difference. For the energy bar data, we decide that we are willing to take a 5% risk of saying that the unknown population mean is different from 20 when in fact it is not. In statistics-speak, we set α = 0.05. In practice, setting your risk level (α) should be made before collecting the data.
We find the value from the t-distribution based on our decision. For a t-test, we need the degrees of freedom to find this value. The degrees of freedom are based on the sample size. For the energy bar data:

degrees of freedom = $ n - 1 = 31 - 1 = 30 $

The critical value of t with α = 0.05 and 30 degrees of freedom is +/- 2.043. Most statistics books have look-up tables for the distribution. You can also find tables online. The most likely situation is that you will use software and will not use printed tables.
We compare the value of our statistic (3.07) to the t value. Since 3.07 > 2.043, we reject the null hypothesis that the mean grams of protein is equal to 20. We make a practical conclusion that the labels are incorrect, and the population mean grams of protein is greater than 20.

Statistical details

Let’s look at the energy bar data and the 1-sample t-test using statistical terms.

Our null hypothesis is that the underlying population mean is equal to 20. The null hypothesis is written as:

$ H_o: \mathrm{\mu} = 20 $

The alternative hypothesis is that the underlying population mean is not equal to 20. The labels claiming 20 grams of protein would be incorrect. This is written as:

$ H_a: \mathrm{\mu} ≠ 20 $

This is a two-sided test. We are testing if the population mean is different from 20 grams in either direction. If we can reject the null hypothesis that the mean is equal to 20 grams, then we make a practical conclusion that the labels for the bars are incorrect. If we cannot reject the null hypothesis, then we make a practical conclusion that the labels for the bars may be correct.

We calculate the average for the sample and then calculate the difference with the population mean, mu:

$ \overline{x} - \mathrm{\mu} $

We calculate the standard error as:

$ \frac{s}{ \sqrt{n}} $

The formula shows the sample standard deviation as s and the sample size as n.

The test statistic uses the formula shown below:

$ \dfrac{\overline{x} - \mathrm{\mu}} {s / \sqrt{n}} $

We compare the test statistic to a t value with our chosen alpha value and the degrees of freedom for our data. Using the energy bar data as an example, we set α = 0.05. The degrees of freedom (df) are based on the sample size and are calculated as:

$ df = n - 1 = 31 - 1 = 30 $

Statisticians write the t value with α = 0.05 and 30 degrees of freedom as:

$ t_{0.05,30} $

The t value for a two-sided test with α = 0.05 and 30 degrees of freedom is +/- 2.042. There are two possible results from our comparison:

The test statistic is less extreme than the critical t values; in other words, the test statistic is not less than -2.042, or is not greater than +2.042. You fail to reject the null hypothesis that the mean is equal to the specified value. In our example, you would be unable to conclude that the label for the protein bars should be changed.
The test statistic is more extreme than the critical t values; in other words, the test statistic is less than -2.042, or is greater than +2.042. You reject the null hypothesis that the mean is equal to the specified value. In our example, you conclude that either the label should be updated or the production process should be improved to produce, on average, bars with 20 grams of protein.

Testing for normality

The normality assumption is more important for small sample sizes than for larger sample sizes.

Normal distributions are symmetric, which means they are “even” on both sides of the center. Normal distributions do not have extreme values, or outliers. You can check these two features of a normal distribution with graphs. Earlier, we decided that the energy bar data was “close enough” to normal to go ahead with the assumption of normality. The figure below shows a normal quantile plot for the data, and supports our decision.

Figure 4: Normal quantile plot for energy bar data

You can also perform a formal test for normality using software. The figure below shows results of testing for normality with JMP software. We cannot reject the hypothesis of a normal distribution.

Figure 5: Testing for normality using JMP software

We can go ahead with the assumption that the energy bar data is normally distributed.

What if my data are not from a Normal distribution?

If your sample size is very small, it is hard to test for normality. In this situation, you might need to use your understanding of the measurements. For example, for the energy bar data, the company knows that the underlying distribution of grams of protein is normally distributed. Even for a very small sample, the company would likely go ahead with the t-test and assume normality.

What if you know the underlying measurements are not normally distributed? Or what if your sample size is large and the test for normality is rejected? In this situation, you can use a nonparametric test. Nonparametric analyses do not depend on an assumption that the data values are from a specific distribution. For the one-sample t-test, the one possible nonparametric test is the Wilcoxon Signed Rank test.

Understanding p-values

Using a visual, you can check to see if your test statistic is more extreme than a specified value in the distribution. The figure below shows a t-distribution with 30 degrees of freedom.

Figure 6: t-distribution with 30 degrees of freedom and α = 0.05

Since our test is two-sided and we set α = 0.05, the figure shows that the value of 2.042 “cuts off” 5% of the data in the tails combined.

The next figure shows our results. You can see the test statistic falls above the specified critical value. It is far enough “out in the tail” to reject the hypothesis that the mean is equal to 20.

Figure 7: Our results displayed in a t-distribution with 30 degrees of freedom

Putting it all together with Software

You are likely to use software to perform a t-test. The figure below shows results for the 1-sample t-test for the energy bar data from JMP software.

Figure 8: One-sample t-test results for energy bar data using JMP software

The software shows the null hypothesis value of 20 and the average and standard deviation from the data. The test statistic is 3.07. This matches the calculations above.

The software shows results for a two-sided test and for one-sided tests. We want the two-sided test. Our null hypothesis is that the mean grams of protein is equal to 20. Our alternative hypothesis is that the mean grams of protein is not equal to 20. The software shows a p-value of 0.0046 for the two-sided test. This p-value describes the likelihood of seeing a sample average as extreme as 21.4, or more extreme, when the underlying population mean is actually 20; in other words, the probability of observing a sample mean as different, or even more different from 20, than the mean we observed in our sample. A p-value of 0.0046 means there is about 46 chances out of 10,000. We feel confident in rejecting the null hypothesis that the population mean is equal to 20.

layout

2 column

Style

columns-75-25, section-top-padding-xsmall