Parameters | Genetics | Filter to Include Predictor Continuous Variables

Use this field to specify a logical expression for subsetting the predictor continuous variables. This is useful when you want to apply a statistical filter to reduce the initial number of candidate predictor variables passed to subsequent stages of predictive model building.The expression must be a valid SAS WHERE clause, and it is applied separately to each predictor variable.For example, to filter only diseased individuals from a data set containing a mixed population of diseased (sick) and healthy (healthy) individuals (as indicated in a column named DiseaseStatus), you could use the following simple WHERE expression:Note: The word where has already been entered for you.You can use one or more of the following statistical keywords in your expression: CSS CV IQR KURTOSIS MAD MAX MEAN MEDIAN MIN NMISS SKEWNESS STD STDERR SUM USS VAR.For example, specifying MEAN > 7 and VAR > 3 keeps only those predictor variables whose mean value across observations in the Input Data Set is greater than 7 and whose variance is greater than 3.You can also use the keyword _NAME_ to filter by predictor variable name. For example, specifying substr(_NAME_,1,3) ne "af_" removes all predictors whose name begins with "af_".When you have specified a Test Data Set or when running Cross Validation Model Comparison, the WHERE clause is applied only to the training data set, and then all predictors not satisfying the expression are dropped from both the training and test data sets.Refer to Definitions of Functions and CALL Routines for details about the aforementioned statistics.