Partial least squares fits linear models based on linear combinations, called factors, of the explanatory variables (Xs). These factors are obtained in a way that attempts to maximize the covariance between the Xs and the response or responses (Ys). In this way, PLS exploits the correlations between the Xs and the Ys to reveal underlying latent structures. The factors address the combined goals of explaining response variation and predictor variation. Partial least squares is particularly useful when you have more X variables than observations or when the X variables are highly correlated.
The NIPALS method works by extracting one factor at a time. Let X = X0 be the centered and scaled matrix of predictors and Y = Y0 the centered and scaled matrix of response values. The PLS method starts with a linear combination t  = X0w of the predictors, where t is called a score vector and w is its associated weight vector. The PLS method predicts both X0 and Y0 by regression on t:
tp´, where p´ = (t´t)-1t´X0
tc´, where c´ = (t´t)-1t´Y0
The vectors p and c are called the X- and Y-loadings, respectively.
The specific linear combination t  = X0w is the one that has maximum covariance t´u with some response linear combination u = Y0q. Another characterization is that the X- and Y-weights, w and q, are proportional to the first left and right singular vectors of the covariance matrix X0´Y0. Or, equivalently, the first eigenvectors of X0´Y0Y0´X0 and Y0´X0X0´Y0 respectively.
X1 = X0
Y1 = Y0
These residuals are also called the deflated X and Y blocks. The process of extracting a score vector and deflating the data matrices is repeated for as many extracted factors as desired.
The SIMPLS algorithm was developed to optimize a statistical criterion: it finds score vectors that maximize the covariance between linear combinations of Xs and Ys, subject to the requirement that the X-scores are orthogonal. Unlike NIPALS, where the matrices X0 and Y0 are deflated, SIMPLS deflates the cross-product matrix, X0´Y0.
In the case of a single Y variable, these two algorithms are equivalent. However, for multivariate Y, the models differ. SIMPLS was suggested by De Jong (1993).
The van der Voet T2 test helps determine whether a model with a specified number of extracted factors differs significantly from a proposed optimum model. The test is a randomization test based on the null hypothesis that the squared residuals for both models have the same distribution. Intuitively, one can think of the null hypothesis as stating that both models have the same predictive ability.
To obtain the van der Voet T2 statistic given in the Cross Validation report, the calculation below is performed on each validation set. In the case of a single validation set, the result is the reported value. In the case of Leave-One-Out and KFold validation, the results for each validation set are averaged.
Denote by the jth predicted residual for response k for the model with i extracted factors. Denote by is the corresponding quantity for the model based on the proposed optimum number of factors, opt. The test statistic is based on the following differences:
Suppose that there are K responses. Consider the following notation:
The van der Voet statistic for i extracted factors is defined as follows:
The significance level is obtained by comparing Ci with the distribution of values that results from randomly exchanging and . A Monte Carlo sample of such values is simulated and the significance level is approximated as the proportion of simulated critical values that are greater than Ci.
T2 Plot
The T2 value for the ith observation is computed as follows:
where tij = X score for the ith row and jth extracted factor, p = number of extracted factors, and n = number of observations used to train the model. If validation is not used, n = total number of observations.
The control limit for the T2 Plot is computed as follows:
((n-1)2/n)*BetaQuantile(0.95, p/2, (n-p-1)/2)
where p = number of extracted factors, and n = number of observations used to train the model. If validation is not used, n = total number of observations. See Tracy, Young, and Mason, 1992.
Consider a scatterplot for score i on the vertical axis and score j on the horizontal axis. The coordinates of the top, bottom, left, and right extremes of the ellipse are as follows:
where z = ((n-1)*(n-1)/n)*BetaQuantile(0.95, 1, (n-3)/2). For background on the z value, see Tracy, Young, and Mason, 1992.
Let X denote the matrix of predictors and Y the matrix of response values, which might be centered and scaled based on your selections in the launch window. Assume that the components of Y are independent and normally distributed with a common variance σ2.
Hoskuldsson (1988) observes that the PLS model for Y in terms of scores is formally similar to a multiple linear regression model. He uses this similarity to derive an approximate formula for the variance of a predicted value. See also Umetrics (1995). However, Denham (1997) points out that any value predicted by PLS is a non-linear function of the Ys. He suggests bootstrap and cross validation techniques for obtaining prediction intervals. The PLS platform uses the normality-based approach described in Umetrics (1995).
Denote the matrix whose columns are the scores by T and consider a new observation on X, x0. The predictive model for Y is obtained by regressing Y on T. Denote the score vector associated with x0 by t0.
Let a denote the number of factors. Define s2 to be the sum of squares of residuals divided by df = a -1 if the data are centered and df = a if the data are not centered. The value of s2 is an estimate of σ2.
The standard error of the predicted mean at x0 is estimated by the following:
Let t0.975, df denote the 0.975 quantile of a t distribution with degrees of freedom df = a -1 if the data are centered and df = a if the data are not centered.
Let t0.975, df denote the 0.975 quantile of a t distribution with degrees of freedom df = a -1 if the data are centered and df = a if the data are not centered.
ntr is the number of observations in the training set
m is the number of effects in X
k is the number of responses in Y
VarXi is the percent variation in X explained by the ith factor
VarYi is the percent variation in Y explained by the ith factor
XScorei is the vector of X scores for the ith factor
YScorei is the vector of Y scores for the ith factor
XLoadi is the vector of X loadings for the ith factor
YLoadi is the vector of Y loadings for the ith factor
The vector of ith Standardized X Scores is defined as follows:
The vector of ith Standardized Y Scores is defined as follows:
The vector of ith Standardized X Loadings is defined as follows:
The vector of ith Standardized Y Loadings is defined as follows:
When a categorical variable is entered as Y in the launch window, it is coded using indicator coding. If there are k levels, each level is represented by an indicator variable with the value 1 for rows in that level and 0 otherwise. The resulting k indicator variables are treated as continuous and the PLS analysis proceeds as it would with continuous Ys.