Multiple Comparisons

Use this option to obtain tests and confidence levels that compare means defined by levels of your model effects. The goal when making multiple comparisons is to determine if group means differ, while controlling the probability of reaching an incorrect conclusion. The Multiple Comparisons option lets you compare group means with an overall average mean (Analysis of Means) and with a control group mean. You can also conduct pairwise comparisons using either Tukey HSD or Student’s t. To identify pairwise differences that are of practical importance, you can perform equivalence tests.

The Student’s t method only controls the error rate for an individual comparison. As such, it is not a true multiple comparison procedure. All other methods provided control the overall error rate for all comparisons of interest. Each of these methods uses a multiple comparison adjustment in calculating p-values and confidence limits.

If your model contains nominal and ordinal effects, you can conduct comparisons using Least Squares Means estimates, or you can define specific comparisons using User-Defined Estimates. If your model contains only continuous effects, you can compare means using User-Defined Estimates.

Note: Suppose that a continuous effect consists of relatively few levels. If you are interested in comparisons using Least Squares Means Estimates, consider assigning that effect an ordinal (or nominal) modeling type.

Launch the Option

An example of the control window for the Multiple Comparisons option is shown in Launch Window for Least Squares Means Estimates. This example is based on the Big Class.jmp data table, with weight as Y and age, sex, and height as model effects. Two classes of estimates are available for comparisons: Least Squares Means Estimates and User-Defined Estimates.

Least Squares Means Estimates

This option compares least squares means and is available only if there are nominal or ordinal effects in the model. Recall that least squares means are means computed at some neutral value of the other effects in the model. (For a definition of least squares means, see LSMeans Table.) You must select the effect of interest. In Launch Window for Least Squares Means Estimates, Least Squares Means Estimates for age are specified.

Launch Window for Least Squares Means Estimates

User-Defined Estimates

The specification of User-Defined Estimates is illustrated in Launch Window for User-Defined Estimates. Three levels of age and both levels of sex have been selected. Also, two values of height have been manually entered. The Add Estimates button has been clicked, resulting in the listing of all possible combinations of the specified levels. At this point, you can specify more estimates and click the Estimates button again to add them to the list of Estimates for Comparison.

Launch Window for User-Defined Estimates

When you use User-Defined Estimates, effects with no specified levels are set as follows:

•

Continuous effects are set to the mean of the effect.

•

Nominal and ordinal effects are set to the first level in the value ordering.

Note: In this section, we will use the term mean to refer to either estimates of least squares means or user-defined estimates.

Choose Initial Comparisons

Once you have specified estimates, you can choose the types of comparisons that you would like to see in your initial report by making selections under Choose Initial Comparisons. Or click OK without making any selections.

Comparisons with Overall Average - ANOM

Compares each effect least squares mean with the overall average least squares mean. (Analysis of Means).

Comparisons with Control - Dunnett’s

Compares each effect least squares mean with the least squares mean of a control level.

All Pairwise Comparisons - Tukey HSD

Tests all pairwise comparisons of the effect least squares means using the Tukey HSD adjustment for multiplicity.

All Pairwise Comparisons - Student’s t

Tests all pairwise comparisons of the effect least squares means with no multiplicity adjustment.

Each of these selections opens a report with an area at the top that shows details specific to the report. This information includes the quantile, or critical value. For the true multiple comparisons procedures, the method used for the multiple comparison adjustment is shown. If you have specified User-Defined Estimates, the report displays a list of effects that do not vary relative to the specified estimates and the levels at which these effects are set. Unless you have specified otherwise, any continuous effect is set to its mean. Any nominal or ordinal effect is set to the first level in its value ordering.

If you click OK without selecting from the Choose Initial Comparisons list, the Multiple Comparisons report opens, showing the Least Squares Means Estimates table or the User-Defined Estimates table. From the Multiple Comparison’s red triangle menu, all of the options listed above are available. The available reports and options are described below.

Least Squares Means or User-Defined Estimates Report

By default, the Multiple Comparisons option displays a Least Squares Means Estimates report or a User-Defined Estimates report, depending on the type of estimates you selected in the launch window. For each combination of levels of interest, this table gives an estimate of the mean, as well as a test and confidence interval. Specifically, this table gives the following:

Levels of the Categorical Effects

The first columns in the report identify the effect or effects of interest. The values in the columns specify the groups being analyzed.

Estimate

Gives an estimate of the mean for each group.

Std Error

Gives the standard error of the mean for each group.

Shows the degrees of freedom for a test of whether the mean is 0.

Lower 95%

Shows the lower confidence limit for the mean. You can change the confidence level by selecting Set Alpha Level in the Fit Model window.

Upper 95%

Shows the upper confidence limit for the mean.

t Ratio

Shows the t ratio for the significance test.

Prob>|t|

Gives the p-value for the significance test.

Note: You can obtain t ratios and p-values by right-clicking in the table and selecting Columns.

Comparisons with Overall Average

This option compares the means for the specified levels specified to the overall mean for these levels. It displays a table showing confidence intervals for differences from the overall mean and a chart showing decision limits. The method used to make the comparisons is called analysis of means (ANOM) (Nelson, et al., 2005). ANOM is a multiple comparison procedure that controls the joint error rate for all pairwise comparisons to the overall mean. See Comparisons with Overall Average for Ratings for a report based on the Movies.jmp sample data table.

ANOM might appear similar to analysis of variance. However, it is fundamentally different in that it identifies levels with means that differ from the overall mean for all levels. In contrast, analysis of variance tests for differences in the means themselves.

At the top of the Comparisons with Overall Average report you find:

Quantile

The value of Nelson’s h statistic used in constructing the decision limits.

Avg

The average mean. For least squares estimates, the average mean is a weighted average of the group least squares means that represents the overall mean at the neutral settings where the group least squares means are calculated.

Specifically, the average least squares mean is a weighted average with weights inversely proportional to the diagonal entries of the matrix

. Here L is the matrix of coefficients used to compute the group least squares means. For a technical definition of least squares means and the average least squares mean, see the GLM Procedure section in the SAS/STAT 9.3 User’s Guide. Search for “Construction of Least Squares Means”.

For user-defined estimates, the average mean is defined similarly. However, in this case, L is the matrix of coefficients used to define the estimates.

Adjustment

Describes the method used to obtain the critical value:

–

Nelson: Provides exact critical values and p-values. Used whenever possible, in particular, when the estimates are uncorrelated.

–

Nelson-Hsu: Provides approximate critical values and p-values based on Hsu’s factor analytical approximation is used (Hsu, 1992). Used when exact values can not be obtained.

–

Sidak: Used when both Nelson and Nelson-Hsu fail.

For technical details, see the GLM Procedure section in the SAS/STAT 9.3 User’s Guide. Search for “Approximate and Simulation-Based Methods”.

Three options are available from the Comparisons with Overall Average report menu:

Differences from Overall Average

For each comparison of a group’s mean to the overall mean, this report provides the following details:

•

Difference - the estimated difference

•

Std Error - the standard error of the difference

•

DF - the degrees of freedom used in constructing the confidence interval

•

Lower and Upper limits for the confidence interval

Comparisons with Overall Average Decision Chart

This decision chart plots a point at the mean for each group. A horizontal line is plotted at the average mean. Upper and lower decision limits are plotted. Suppose that a point corresponding to a group mean falls outside these limits. This occurrence indicates that the group mean differs from the overall mean, based on the analysis of means test at the specified significance level. The significance level is shown below the chart.

The Comparisons with Overall Average Decision Chart report menu has these options:

Show Summary Report

Produces a table showing the estimate, decision limits, and the limit exceeded for each group

Display Options

Gives several options for controlling the display of the chart.

Calculate P-Values

Adds columns giving t ratios (t Ratio) and p-values (Prob>|t|) to the Comparisons with Overall Average report. Note that computing exact critical values and p-values for unbalanced designs requires complex integration and can be computationally challenging. When calculations for such a quantile fail, the Sidak quantile is computed but p-values are not available.

Example of Comparisons with Overall Average

Consider the Movies.jmp sample data table. You are interested in whether any of the four Rating categories are unusual in that their mean Domestic $ revenues differ from the overall average revenue. You specify a model with Domestic $ as the response and Type, Rating, and Year as model effects.

Select Help > Sample Data Library and open Movies.jmp.

Select Analyze > Fit Model.

Select Domestic $ and click Y.

Select Type, Rating, and Year, and click Add

Click Run.

From the red triangle next to Response Domestic $, select Estimates > Multiple Comparisons.

From the Choose an Effect list, select Rating.

In the Choose Initial Comparisons list, select Comparisons with Overall Average.

Click OK.

10.

From the Comparisons with Overall Average red triangle menu, select Calculate P-Values.

The results shown in Comparisons with Overall Average for Ratings indicate that the least squares means for movies with a Rating of PG-13 and R differ significantly from the overall average in terms of Domestic $.

Comparisons with Overall Average for Ratings

Comparisons with Control

If you select Comparisons with Control - Dunnett’s, a window opens, asking you to specify a control group. If you selected Least Squares Means Estimates, the list consists of all levels of the effect you that you selected. If you selected User-Defined Estimates, the list consists of the combinations of effect levels that you specified.

After you choose a control group and click OK, the Comparisons with Control report appears in your Fit Least Squares report. This option compares the means for the specified settings to the control group mean. It displays a table showing confidence intervals for differences from the control group and a chart showing decision limits. Dunnett’s method is used to make the comparisons. Dunnett’s method is a multiple comparison procedure that controls the error rate over all comparisons (Hsu, 1996 and Westfall et al., 2011).

When exact calculation of p-values and confidence intervals is not possible, Hsu’s factor analytical approximation is used (Hsu, 1992). Note that computing exact critical values and p-values for unbalanced designs requires complex integration and can be computationally intensive. When calculations for such a quantile fail, the Sidak quantile is computed.

In addition to the list of effects that do not vary for the specified estimates, at the top of the Comparisons with Control report you also find:

Quantile

The critical value for Dunnett’s test.

Control

The setting that defines the control group. This is a single level if you have selected a single effect; it is a combination of levels if you specified a user-defined combination of more than one effect.

Adjustment

The method used to obtain the critical value:

–

Dunnett: Provides exact critical values and p-values. Used whenever possible, in particular, when the estimates are uncorrelated.

–

Dunnett-Hsu: Provides approximate critical values and p-values based on Hsu’s factor analytical approximation (Hsu, 1992). Used when exact values can not be obtained.

–

Sidak: Used when both Dunnett and Dunnett-Hsu fail.

For technical details, see the GLM Procedure section in the SAS/STAT 9.3 User’s Guide. Search for “Approximate and Simulation-Based Methods”.

Three options are available from the Comparisons with Control report menu:

Differences from Control

For each comparison of a group mean to the control mean, this report provides the following details:

•

Difference - the estimated difference

•

Std Error - the standard error of the difference

•

DF - the degrees of freedom used in constructing the confidence interval

•

Lower and Upper limits for the confidence interval

Comparisons with Control Decision Chart

This decision chart plots a point at the mean for each group being compared to the control group. A horizontal line shows the mean for the control group. Upper and lower decision limits are plotted. When a point falls outside these limits, it corresponds to a group whose mean differs from the control group mean based on Dunnett’s test at the specified significance level. That level is shown beneath the chart.

The Comparisons with Control Decision Chart report menu has these options:

Show Summary Report

Produces a table showing the estimate, decision limits, and the limit exceeded for each group

Display Options

Gives several options for controlling the display of the chart.

Calculate P-Values

Adds columns giving t ratios (t Ratio) and p-values (Prob>|t|) to the Comparisons with Control report. Note that computing exact critical values and p-values for unbalanced designs requires complex integration and can be computationally challenging. When calculations for such a quantile fail, the Sidak quantile is computed but p-values are not available.

All Pairwise Comparisons

The All Pairwise Comparisons option shows either a Tukey HSD All Pairwise Comparisons or Student’s t All Pairwise Comparisons report (Hsu, 1996 and Westfall et al., 2011). Tukey HSD comparisons are constructed so that the significance level applies jointly to all pairwise comparisons. In contrast, for Student’s t comparisons, the significance level applies to each individual comparison. When making several pairwise comparisons using Student’s t tests, the risk that one of the comparisons incorrectly signals a difference can well exceed the stated significance level.

At the top of the Tukey HSD All Pairwise Comparisons report you find:

Quantile

The critical value for the test. Note that, for Tukey HSD, the quantile is

, where q is the appropriate percentage point of the Studentized range statistic.

Adjustment

Describes the method used to obtain the critical value:

–

Tukey: Provides exact critical values and p-values. Used when the means are uncorrelated and have equal variances, or when the design is variance-balanced.

–

Tukey-Kramer: Provides approximate critical values and p-values. Used when exact values can not be obtained.

For technical details, see the GLM Procedure section in the SAS/STAT 9.3 User’s Guide. Search for “Approximate and Simulation-Based Methods”.

At the top of the Student’s t All Pairwise Comparisons report you find the Quantile, or critical value, for the t test.

All Pairwise Differences Report

Both Tukey HSD and Student’s t compare all pairs of levels. For each pairwise comparison, the All Pairwise Differences report shows:

•

The levels being compared

•

Difference - the estimated difference between the means

•

Std Error - the standard error of the difference

•

DF - the degrees of freedom used in constructing the confidence interval

•

t Ratio - the t ratio for the test of whether the difference is zero

•

Prob > |t| - the p-value for the test

•

Lower and Upper limits for a confidence interval for the difference in means

All Pairwise Comparisons Scatterplot

This plot, sometimes called a diffogram or a mean-mean scatterplot, displays the confidence intervals for all means pairwise differences. (See All Pairwise Comparisons Scatterplot for User-Defined Comparisons for an example.) Colors indicate which differences are significant.

The plot shows a reference line as an upwardly sloping line on the diagonal. This line represents points where the two means are equal. Each line segment corresponds to a confidence interval for a pairwise comparison. The coordinates of the point displayed on the line segment are the means for the corresponding groups. Placing your cursor over one of these points displays a tooltip identifying the groups being compared and showing the estimated difference. If a line segment crosses the line on the diagonal, then the means can be equal and the comparison is not significant.

Equivalence Tests

Use this option to conduct one or more equivalence tests. Equivalence tests are useful when you want to detect differences that are of practical interest. You are asked to specify a threshold difference for group means for which smaller differences are considered practically equivalent. In other words, if two group means differ by this amount or less, you are willing to consider them equivalent.

Once you have specified this value, the Equivalence Tests report appears. The bounds that you have specified are given at the top of the report. The report consists of a table giving the equivalence tests and a scatterplot that displays them. The equivalence tests and confidence intervals are based on Tukey HSD or Student’s t critical values, corresponding to the option that you selected.

Equivalence TOST Tests

The Two One-Sided Tests (TOST) method is used to test for a practical difference between the means (Schuirmann, 1987). Two one-sided pooled-variance t tests are constructed for the null hypotheses that the true difference exceeds the threshold values. If both tests reject, the difference in the means does not statistically exceed either threshold value. Therefore, the groups are considered practically equivalent. If only one or neither test rejects, then the groups might not be practically equivalent.

For each comparison, the Equivalence TOST Tests report gives the following information:

•

Difference - the estimated difference in the means

•

Lower Bound t Ratio, Upper Bound t Ratio - the lower and upper bound t ratios for the two one-sided pooled-variance significance tests

•

Lower Bound p-Value, Upper Bound p-value - p-values corresponding to the lower and upper bound t ratios

•

Maximum p-Value - the maximum of the lower and upper bound p-values

•

Lower and Upper limits for a

confidence interval for the difference in the means.

Equivalence Tests Scatterplot

Using colors, this scatterplot indicates which means are practically equivalent and which are not as determined by the equivalence test. (See Equivalence Tests Scatterplot.)

The plot shows a solid reference line on the diagonal as well as a shaded reference band. The width of the band is twice the practical difference. Each line segment corresponds to a

confidence interval for a pairwise comparison. The coordinates of the point on the line segment are the means for the corresponding groups. Placing your cursor over one of these points displays a tooltip indicating the groups being compared and the estimated difference. When a line segment is entirely contained within the diagonal band, it follows that the means are practically equivalent.

Remove

This option removes the Equivalence Tests report.

Example of Tukey HSD All Pairwise Comparisons

Consider the Movies.jmp sample data table. You are interested in Domestic $ differences for action and drama movies across two Rating categories, PG-13 and R, in the year 1998.

Select Help > Sample Data Library and open Movies.jmp.

Select Analyze > Fit Model.

Select Domestic $ and click Y.

Select Type, Rating, and Year, and click Add.

Click Run.

From the red triangle next to Response Domestic $, select Estimates > Multiple Comparisons.

From the Type of Estimates list, click User-Defined Estimates.

From the Choose Type levels list, select Action (Action should already be selected by default) and Drama.

From the Choose Rating levels list, select PG-13 and R.

10.

In the list entitled Year, enter the year 1998.

11.

Click Add Estimates. Note that all possible combinations of the levels you specified are now displayed beneath the Add Estimates button.

12.

In the Choose Initial Comparisons list, select All Pairwise Comparisons - Tukey HSD.

Check that your window is populated as shown in Populated Used-Defined Estimates Window.

Populated Used-Defined Estimates Window

13.

Click OK.

The All Pairwise Differences report indicates that three of the six pairwise comparisons are significant. The All Pairwise Comparisons Scatterplot, shown in All Pairwise Comparisons Scatterplot for User-Defined Comparisons, shows the confidence intervals for these comparisons in red. Also shown is the tooltip for one of these intervals, indicating that the interval compares Action, Rating R movies to Drama, Rating PG-13 movies, and that the mean difference in Domestic $ is -53.58.

All Pairwise Comparisons Scatterplot for User-Defined Comparisons

14.

From the Tukey HSD All Pairwise Comparisons report’s red triangle menu, select Equivalence Tests.

15.

In the text box that appears, enter 50.

16.

Click OK.

TOST tests are conducted to determine which movie categories are equivalent, given that you consider categories that differ by less than 50 in units of Domestic $ to be equivalent. The Equivalence Tests Scatterplot (Equivalence Tests Scatterplot) indicates that two pairs of movie categories can be considered equivalent.

Equivalence Tests Scatterplot