Modeling techniques such as the Elastic Net and the Lasso are particularly promising for large data sets, where collinearity is typically a problem. In fact, modern data sets often include more variables than observations. This situation is sometimes referred to as the p > n problem, where n is the number of observations and p is the number of predictors. Such data sets require variable selection if traditional modeling techniques are to be used.
•

•

The Lasso has two shortcomings. When several variables are highly correlated, it tends to select only one variable from that group. When the number of variables, p, exceeds the number of observations, n, the Lasso selects at most n predictors.

•

The Elastic Net, on the other hand, tends to select all variables from a correlated group, fitting appropriate coefficients. It can also select more than n predictors when p > n. The Elastic Net fit generally takes more processing time than the Lasso.

The Generalized Regression personality also fits an adaptive version of the Lasso and the Elastic Net. These adaptive versions attempt to penalize active variables less than inactive variables. The adaptive versions were developed to ensure that the oracle property holds. The oracle property guarantees the following: Asymptotically, your estimates are what they would have been had you known that predictors were active contributors to the model. More specifically, your model correctly identifies the predictors that should have zero coefficients. Your estimates converge to those that would have been obtained had you started with only the active predictors.