Parameters | Predictive Modeling | Filter to Include Predictor Categorical Variables

Filter to Include Predictor Categorical Variables
Use this field to specify a logical expression for subsetting the Predictor Categorical Variables. This is useful when you want to apply a statistical filter to reduce the initial number of candidate predictor variables passed to subsequent stages of predictive model building.
Categorical Variables are coded using 0 and 1 values, so filtering criteria are usually in terms of proportions. The expression must be a valid SAS WHERE clause, and it is applied separately to each predictor categorical variable.
For example, to filter only diseased individuals from a data set containing a mixed population of diseased (sick) and healthy (healthy) individuals (as indicated in a column named DiseaseStatus), you could use the following simple WHERE expression:
WHERE DiseaseStatus = ‘sick’
Note: the word where has already been entered for you.
For more information about using filter fields, see The SAS WHERE Expression.
Reducing the Number of Predictors:
You can use one or more of the following statistical keywords in your expression: CSS CV IQR KURTOSIS MAD MAX MEAN MEDIAN MIN NMISS SKEWNESS STD STDERR SUM USS VAR.
For example, specifying MEAN > 7 and VAR > 3 keeps only those predictor variables whose mean value across observations in the input data set is greater than 7 and whose variance is greater than 3.
You can also use the keyword _NAME_ to filter by predictor variable name. For example, specifying substr(_NAME_,1,3) ne "af_" removes all predictors whose name begins with "af_".
When you have specified a test data set or when running Cross Validation Model Comparison, the WHERE clause is applied only to the training data set, and then all predictors not satisfying the expression are dropped from both the training and test data sets.
Refer to Definitions of Functions and CALL Routines for details about the aforementioned statistics.