JMP Clinical Starter | Predictive Modeling

Predictive Modeling
The primary focus of JMP Clinical is scientific discovery and understanding through statistics and graphics. However, the software does offer some basic capabilities for creating predictive models.
Constructing predictors of either continuous or categorical outcomes
Predictive modeling is also known as exploratory modeling or data mining. These documentation pages discuss the JMP Clinical functions that target exploratory and basic data mining for clinical data. For advanced, enterprise-scale data mining, SAS Enterprise Miner software offers a full spectrum of methods and a convenient, workflow-style interface. After the clinical data has been appropriately preprocessed and stored as a wide SAS data set, one or more of the processes can be run to perform exploratory data mining. The same data set can also be used with Enterprise Miner to obtain more rigorous results and scoring rules.
Data Sets
All of the processes described in these pages require that data be in wide format, with individual samples as rows and experimental design variables as columns. Any data that are in tall form must be converted to the wide format. Use the Transpose Tall to Wide command to convert the tall data set and its accompanying Experimental Design Data Set (EDDS) to wide form.
With multiple tables containing different forms of data on a set of samples (for example, both genetic marker and microarray data), merge them into one single wide data set using the Clinical > SAS Data Set Utilities > Tables > Merge process, as described in Merge. These data can then be used together to build jointly predictive models. We recommend you preprocess and analyze the different data types separately and then combine them just prior to predictive modeling.
For large data sets with tens or hundreds of thousands of predictors, computing time for some of the JMP Clinical predictive modeling processes can become prohibitively long. In this situation, perform a preliminary reduction of the predictor set by using the Clinical > Pattern Discovery > K-Means Clustering process to select a thousand or so representative predictors. (The data must be in tall form to execute this process. Use the Transpose Wide to Tall and Transpose Tall to Wide processes to go back and forth between tall and wide forms.)
When performing variable selection (or reduction) with an entire data set, it is important to realize that an optimistic bias can be introduced in subsequent analyses. To compensate for this, hold out a fraction of the data from the beginning and use for subsequent prediction. Many of the processes have built-in cross validation capabilities to help prevent selection bias. Alternatively, cross validation can be done manually by creating one or more new columns that are copies of the variables being predicted and then setting subsets of them to missing values. Although the ultimate test of generalizability of any predictive model is with new data from an independent research center, computer-based cross validation is invaluable in assessing initial performance of the models.
Please consult the subcategory documentation pages, as well as the documentation on individual processes, for additional information.
See the JMP Clinical Starter main page for other process categories.