This section gives the derivation of formulas saved by Score Options > Save Formulas. The formulas depend on the Discriminant Method.
For each group defined by the categorical variable X, observations on the covariates are assumed to have a pdimensional multivariate normal distribution, where p is the number of covariates. The notation used in the formulas is given in Notation for Formulas Given by Save Formulas Options.
nt


p by 1 vector of covariates for an observation


ybar

p by 1 vector of means for the covariates across all observations

qt


A

In linear discriminant analysis, all withingroup covariance matrices are assumed equal. The common covariance matrix is estimated by Sp. See Notation for Formulas Given by Save Formulas Options for notation.
Note that the number of parameters that must be estimated is p2 for the pooled covariance matrix plus pT for the means.
The posterior probability of membership in group t is given as follows:
An observation y is assigned to the group for which its posterior probability is the largest.
SqDist[<group t>]


Prob[<group t>]


Pred <X>

In quadratic discriminant analysis, the withingroup covariance matrices are not assumed equal. The withingroup covariance matrix for group t is estimated by St. This means that the total number of parameters to be estimated is Tp2 + Tp: Tp2 for the withingroup covariance matrices and Tp for the means.
When group sample sizes are small relative to p, the estimates of the withingroup covariance matrices tend to be highly variable. The discriminant score is heavily influenced by the smallest eigenvalues of the inverse of the withingroup covariance matrices. See Friedman, 1989. For this reason, if your group sample sizes are small compared to p, you might want to consider the Regularized method, described in Regularized Discriminant Method.
See Notation for Formulas Given by Save Formulas Options for notation. The Mahalanobis distance from an observation y to group t is defined as follows:
The posterior probability of membership in group t is the following:
An observation y is assigned to the group for which its posterior probability is the largest.
SqDist[<group t>]


Prob[<group t>]


Pred <X>

•

The parameter λ balances weights assigned to the pooled covariance matrix and the withingroup covariance matrices, which are not assumed equal.

•

The parameter γ determines the amount of shrinkage toward a diagonal matrix.

This method enables you to leverage two aspects of regularization to bring stability to estimates for quadratic discriminant analysis. See Friedman, 1989. See Notation for Formulas Given by Save Formulas Options for notation.
The posterior probability of membership in group t given by the following:
An observation y is assigned to the group for which its posterior probability is the largest.
SqDist[<group t>]


Prob[<group t>]


Pred <X>

The Wide Linear method is useful when you have a large number of covariates and, in particular, when the number of covariates exceeds the number of observations (p > n). This approach centers around an efficient calculation of the inverse of the pooled withincovariance matrix Sp or of its transpose, if p > n. It uses a singular value decomposition approach to avoid inverting and allocating space for large covariance matrices.
See Notation for Formulas Given by Save Formulas Options for notation. The steps in the Wide Linear calculation are as follows:
1.

Compute the T by p matrix M of withingroup sample means. The (t,j)th entry of M, mtj, is the sample mean for members of group t on the jth covariate.

2.

3.

Using notation, for an observation i in group t, the groupcentered and scaled value for the jth covariate is:
6.

Denote the pooled withincovariance matrix for the groupcentered and scaled covariates by R. The matrix R is given by the following:

where U and V are orthonormal and D is a diagonal matrix with positive entries (the singular values) on the diagonal. See The Singular Value Decomposition in Statistical Details.
Then R can be written as follows:
8.

where D1 is the diagonal matrix whose diagonal entries are the inverses of the diagonal entries of D.
Then define the inverse square root of R as follows:
9.

If R is of full rank, it follows that . So, for completeness, the discussion continues using pseudoinverses.

The formulas for the Mahalanobis distance, the likelihood, and the posterior probabilities are identical to those in Linear Discriminant Method. However, the inverse of Sp is replaced by a generalized inverse computed using the singular value decomposition.
When you save the formulas, the Mahalanobis distance is given in terms of the decomposition. For an observation y, the distance to group t is the following, where the last equality uses the notation seen in the saved formulas:
The data transformed by the principal component scoring matrix, which renders the data uncorrelated within groups. Given by, where is a 1 by p vector containing the overall means.


SqDist[<group t>]


Prob[<group t>]


Pred <X>

Using the notation in Notation for Formulas Given by Save Formulas Options, this matrix is defined as follows: