An l2 penalty is applied to the regression coefficients during ridge regression. Ridge regression coefficient estimates are given by the following:
is the l2 penalty, λ is the tuning parameter, N is the number of rows, and p is the number of variables.
An l1 penalty is applied to the regression coefficients during Lasso. Coefficient estimates for the Lasso are given by the following:
is the l1 penalty, λ is the tuning parameter, N is the number of rows, and p is the number of variables
The Elastic Net combines both l1 and l2 penalties. Coefficient estimates for the Elastic Net are given by the following:
is the l1 penalty, is the l2 penalty, and λ is the tuning parameter. The α tuning parameter determines the mix of the l1 and l2 penalties, N is the number of rows, and p is the number of variables.
The adaptive Lasso method uses weighted penalties to provide consistent estimates of coefficients. The weighted form of the l1 penalty is
For the adaptive Lasso, this weighted form of the l1 penalty is used in determining the coefficients.
The adaptive Elastic Net uses this weighted form of the l1 penalty and also imposes a weighted form of the l2 penalty. The weighted form of the l2 penalty for the adaptive Elastic Net is
The tuning parameters for ridge regression and the Lasso are found by searching a grid of values for the tuning parameter that best minimizes the penalized likelihood. This grid of values lies between the minimum and maximum tuning parameters. When the minimum tuning parameter is zero, and the solution is unpenalized, the coefficients are the MLEs.
Values between the minimum and maximum tuning parameters (by default, between 0 and 1) are iteratively searched to determine the best tuning parameter. The grid of possible tuning parameters can be set up in three different scales: linear, log, and square root.
In some cases, there is a large gap between the unpenalized estimates and the previous step. This large gap can distort the solution path. The log scale focuses it search on small tuning parameter values with few large values, whereas the linear scale evenly disperses the search from the minimum to the maximum value. The square root scale is a compromise between the other two scales. Options for Tuning Parameter Grid Scale shows the different grid scales.
The distributions fit by the Generalized Regression personality are given below in terms of the parameters used in model fitting. Although it is not specifically stated as part of their descriptions, the Generalized Regression personality enables you to input non-integer values for the discrete distributions.