Main Methods

Click on a button corresponding to a predictive modeling main method. All processes require a wide data set. (See Tall and Wide Data Sets.) For a more thorough introduction to predictive modeling and these processes, see Introduction to Predictive Modeling.

Refer to the table below for key features and general guidance on these processes. You are encouraged to explore multiple processes and use the individual process links for a more detailed explanation of each.

Tip: When in doubt, there is no harm in trying several predictive modeling methods on your data. The Predictive Modeling Review enables you to standardize model parameters and specifications. Additional tools are also available in the Model Comparisons submenu for this purpose.

Process

Uses SAS PROC(s)

Permits dependent variables of type

Particularly appropriate for data with these characteristics

Classification boundary shape for binary dependent variable

Other classification and process characteristics

Predictive Modeling Review

•

Distance Scoring

DISTANCE

•

Nominal

•

Binary

•

Ordinal

•

Continuous

Variable; depends on the distance metric

•

Nonparametric discriminant method

Tip: Diagonal Linear Discriminant Analysis can be performed via the Euclidean Distance Metric.

General Linear Model Selection

GLMSELECT

•

Binary

•

Ordinal

•

Continuous

Linear

•

Flexible

•

Many model selection methods available

•

Many inclusion and stopping criteria available

Partial Least Squares

PLS

•

Binary

•

Ordinal

•

Continuous

•

More variables than observations (wide data sets)

•

Multicollinearity among predictor variables exists

Linear or quadratic

•

Linear regression model

•

Simultaneously models variability in both dependent and predictor variables

Partition Trees

GENESELECT

•

Nominal

•

Binary

•

Ordinal

•

Continuous

•

Can be represented as a hierarchy of partitions

Step function

•

Simple tree-based rule sets from optimal splitting relationships between dependent and predictor variables are used

Quantile Regression Selection

QUANTREG

•

Binary

•

Ordinal

•

Continuous

•

The median or particular quantiles of the dependent variables are better measures of central tendency than the mean

Linear or quadratic

•

Flexible

•

Many model selection methods available

•

Many inclusion and stopping criteria available

•

Model robustness; data robustness to outliers

Radial Basis Machine

GLIMMIX

•

Binary

•

Ordinal

•

Continuous

•

More variables than observations (wide data sets)

Any shape

•

Dimensions of calculations are based on the number of observations, rather than the number of variables

Ridge Regression

MIXED

•

Binary

•

Ordinal

•

Continuous

•

More variables than observations (wide data sets)

•

Multicollinearity among predictor variables exists

•

Continuous dependent variable

Linear or quadratic

•

Computes Best Linear Unbiased Predictions (BLUPs) of the responses based on a mixed model

•

Shrinks (regresses) estimates toward a common mean

Discriminant Analysis

STEPDISC

DISCRIM

•

Nominal

•

Binary

•

Ordinal

•

Can be represented by a multivariate normal distribution with known classes

•

Fewer variables than observations

Linear, parabolic, or S-shaped

•

Based on Fisher discriminant analysis

K Nearest Neighbors

DISCRIM

•

Nominal

•

Binary

•

Ordinal

•

Fewer variables than observations

Any shape

•

Nonparametric discriminant method

•

Predictions based on the set of k training observations that are closest in feature space distance (instance-based learning)

Logistic Regression

LOGISTIC

•

Nominal

•

Binary

•

Ordinal

•

Fewer variables than observations

S-shaped

•

Data fit to a logistic curve using a logit link function

Caution: This process can take a long time to run, depending on the number of predictor variables and the speed of your machine.

Life Regression

LIFEREG

•

Binary

•

Time-to-event

•

Censor

•

Time-to-event data

•

Data that follows one of the time-to-event distributions (Weibull, for example)

•

Fits parametric models to time-to-event data.

•

Predictor reduction methods can be used to trim a large set of predictors.

Proportional Hazards Regression

PHREG

•

Binary

•

Ordinal

•

Continuous

•

Survival data with time-to-event variable (censor variable optional)

•

Fewer variables than observations

Exponential family

•

Uses a Cox proportional hazards model

•

Many model selection methods available

Caution: This process can be computationally intensive for large data sets.

Genomic BLUP

MIXED

HPMIXED

•

Binary

•

Continuous

•

Continuous dependent variable

Linear or quadratic

•

Computes Best Linear Unbiased Predictions (BLUPs) of the responses based on a mixed model.

•

Shrinks (regresses) estimates toward a common mean.

Predictive Modeling Review

Click to sets up a predictive modeling review that can be used to compare the efficacy of different models, applied to one or more dependent variables, at making predictions under the same conditions and compare the models using cross validation, test sets, or learning curves.

See Predictive Modeling for other subcategories.