Launch the Generalized Regression personality by selecting Analyze > Fit Model, entering one or more columns for Y, and selecting Generalized Regression from the Personality menu (Fit Model Launch Window with Generalized Regression Selected).
Fit Model Launch Window with Generalized Regression Selected
When you select Generalized Regression from the Personality menu, the Distribution option appears. Here you can specify a distribution for Y. The abbreviation ZI means zero-inflated. The options are described below.
Y has a normal distribution with mean μ and standard deviation σ. The link function for μ is the identity. That is, the mean of Y is expressed as a linear model. In estimating the model parameters, the scale parameter σ is profiled out. It is then replaced by its maximum likelihood estimate (MLE). (Note that the MLE is not the typical unbiased estimator for σ.)
Y has a binomial distribution with parameters p and n. The link function for p is the logit. When you select Binomial as the Distribution, the response variable must be specified in one of the following ways.
Y has a Poisson distribution with mean λ. The link function for λ is the logarithm. Poisson regression is permitted even if Y assumes non-integer values.
Y has a zero-inflated Poisson distribution with parameters λ and zero-inflation parameter π. The parameter λ is the conditional mean based on the observations coming from the Poisson distribution and not the inflating zeros. The link function for λ is the logarithm. ZI Poisson regression is permitted even if Y assumes no observed zeros or non-integer values. See Distributions.
Y has a negative binomial distribution with mean μ and dispersion parameter σ. The link function for μ is the logarithm. Negative binomial regression is permitted even if Y assumes non-integer values. See Distributions.
Y has a zero-inflated negative binomial with location parameter μ, dispersion parameter σ, and zero-inflation parameter π. The parameter μ is the conditional mean based on the observations coming from the negative binomial distribution and not the inflating zeros. The link function for μ is the logarithm. Zero-inflated negative binomial regression is permitted even if Y assumes no observed zeros or non-integer values. See Distributions.
Y has a gamma distribution with parameters μ and σ. The link function for μ is the logarithm. See Distributions.
Requirements for Y for Distributions gives the Data Types, Modeling Types, and other requirements for Y variables assigned the various distributions.
μ, σ
λ, π (zero-inflation)
μ, σ
μ, σ, π (zero-inflation)
μ, σ
After selecting an appropriate Distribution, click Run. The Generalized Regression Model Launch panel opens.
the l1 penalty, which penalizes the sum of the absolute values of the regression coefficients
the l2 penalty, which penalizes the sum of the squares of the regression coefficients
Computes parameter estimates by applying an l1 penalty. Due to the l1 penalty, some coefficients can be estimated as zero. Thus, variable selection is performed as part of the fitting procedure. In the ordinary Lasso, all coefficients are equally penalized.
Computes parameter estimates by penalizing a weighted sum of the absolute values of the regression coefficients. The weights in the l1 penalty are determined by the data in such as way as to guarantee the oracle property (Zou, 2006). This option is not available if MLEs for the regression parameters cannot be computed. MLEs cannot be computed when the number of predictors exceeds the number of observations or when there are strict linear dependencies among the predictors.
Computes parameter estimates by applying both an l1 penalty and an l2 penalty. The l1 penalty ensures that variable selection is performed. The l2 penalty improves predictive ability by shrinking the coefficients as ridge does.
Computes parameter estimates using an adaptive l1 penalty as well as an l2 penalty. This option is not available if MLEs for the regression parameters cannot be computed. MLEs cannot be computed when the number of predictors exceeds the number of observations or when there are strict linear dependencies among the predictors.
Partitions the data into K subsets, or folds. In turn, each fold is used to validate the model that is fit to the rest of the data, fitting a total of K models. This method is best for small data sets, because it makes efficient use of limited amounts of data.
Minimizes the Bayesian Information Criterion (BIC) over the solution path. If k is the number of parameters and n is the sample size, the BIC is defined as:
Minimizes the Akaike Information Criterion (AIC) over the solution path. If k is the number of parameters and n is the sample size, the AIC is defined as:
When you click Go, a report opens. The title of the report specifies the fitting and validation methods that you selected. You can return to the Model Launch window to perform additional analyses and choose other estimation and validation methods.