Multivariate Inliers and Outliers

Report Output | Overviews | Multivariate Inliers and Outliers

Multivariate Inliers and Outliers

This report calculates Mahalanobis distance based on available data, using the equation , to identify subject inliers and outliers in multivariate space from the multivariate mean. Refer to the JMP documentation on Mahalanobis Distance Measures for statistical details. It also generates results by site to see which sites are extreme in this multivariate space.

Mahalanobis distance is plotted on the log scale to allow for easier examination of small scores. The reference line is derived from a transformation of the mean of the approximate chi-square distribution.

This report attempts to use as much data as possible. Along with sex and age, it takes all findings test codes by visit number and time number (if available), as well as frequencies of all event and intervention codes per subject. Of course, doing so can lead to missing data particularly for studies that do not appear to have a fixed number of visits or with lots of dropouts. Because Mahalanobis distance cannot be calculated with lots of missing data present, there is an option to delete variables with at least X% of missing data¹ based on the selected population and filters (default of 5%). Of remaining variables, scores are computed for those subjects with complete data. The general strategy of this report is to use as many variables as possible, while letting a few early dropouts fall out of the analysis.

For the Nicardipine example shown here, 17 out of 512 variables have missing data rates below 5% and are kept. 15 of these variable have missing data, which cause Mahalanobis distance to not be calculated on 50 subjects.

Running this report for Nicardipine using default settings generates the Report shown below.

The report initially shows Mahalanobis Distance and Missing Data,.

•

Mahalanobis Distance: Presents plots of Mahalanobis distance of all subjects (distance is from the multivariate mean), colored by study site, and Box Plots presented by sites.

•

Missing Data: Details variables that contain missing data that prevented Mahalanobis Distance from being calculated for certain subjects (Flag = 1) or variables that were dropped from analysis based on the dialog option Remove variables from analysis with a missing data percentage of at least:. By default, variables with 5% or more of missing data are not used in the calculation of Mahalanobis Distance.

Down Buttons

down buttons, provide you with an easy way to down into your data. The following down buttons are generated by this report:

•

Profile Subjects: Select subjects and click to generate the patient profiles. See Profile Subjects for additional information.

•

Show Subjects: Select subjects and click to open the ADSL (or DM if ADSL is unavailable) of selected subjects.

•

Cluster Subjects: Select subjects and click to cluster them using data from available domains. See Cluster Subjects for additional information.

•

Demographic Counts: Select subjects and click to create a subject filter. Follow-up analyses are subset to these subjects if the Subject Filter is applied in the dialog.

General

•

Click to view the associated data tables. Refer to

Output includes one summary data set (named csass_sum_XXX², by default) containing one record per subject with pre-dosing data, one data set of all pairwise distances within the covariate subgroups (named csass_alldist_XXX, by default), one data set containing minimum pairwise distances for each covariate subgroup (named csass_mindist_XXX), by default), one data set per covariate subgroup containing pairwise distances (named csass_p_Y_XXX, by default, where Y is indexed 1 to the number of covariate subgroups) and one data set per covariate subgroup containing the distance matrix of subjects within the covariate subgroup (named csass_Y_XXX, by default, where Y is indexed 1 to the number of covariate subgroups).

Variable names for Findings data are concatenated with the abbreviation of the Findings test and the visit number (V). For example, DIABP_V2 is the diastolic blood pressure at visit 2. If there are multiple measurements at Visit 2, then it is the average. If there are multiple time points on a single visit, a time number is appended. For example, DIABP_V2_T1 would be the diastolic blood pressure at time point 1 at visit 2 (or the average, if multiple measurements are taken); DIABP_V2_T2 would be the diastolic blood pressure at time point 2 at visit 2.

•

Click to generate a standardized pdf- or rtf-formatted report containing the plots and charts of selected sections.

•

Click to take notes, and store them in a central location. Refer to Add Notes for more information.

•

Click to read user-generated notes. Refer to View Notes for more information.

•

Click the Options arrow to reopen the completed report dialog used to generate this output.

•

Click the gray border to the left of the Options tab to open a dynamic report navigator that lists all of the reports in the review. Refer to Report Navigator for more information.

1

Missing data is show in the tabular data pane that can be viewed by clicking the down option. A Missing Flag value of 1 indicates variables with a missing rate of <5%. A Missing Flag value of 2 indicates variables with a missing rate of >5%, based on the default dialog option of 5%.

2

The _XXX designation is used to designate a one- to three-digit number that is added sequentially to prevent overwriting of existing data sets.