Predictive Modeling and Cross-Validation

Anyone can do a fair job of describing last year's performance. But without the right tools and the most modern techniques, building a model to predict what will happen with new customers, new processes or new risks becomes much more difficult. JMP Pro includes a rich set of algorithms for building better models of your data. Some of the most useful techniques for predictive modeling are decision trees, bootstrap forest, Naive Bayes and neural networks.

The Partition platform in JMP Pro automates the tree-building process with modern methods. This platform also fits K nearest neighbors (K-NN) models.

The bootstrap forest, which uses a random-forest technique, grows dozens of decision trees using random subsets of the data and averages the computed influence of each factor in these trees. The boosted tree technique builds many simple trees, repeatedly fitting any residual variation from one tree to the next.

The Naive Bayes platform uses the principles of Bayes’ Theorem to allow you to predict a categorical response. The platform even allows predictions for combinations of predictors that do not appear in your data.

The advanced Neural platform lets you build one- or two-layer neural networks with your choice of three activation functions and also provides automatic model construction using gradient boosting. The platform automatically handles missing values and transformation of continuous X’s, which saves time and effort and includes robust fitting options.

Each of these platforms in JMP Pro uses cross-validation, which offers a way to validate your model and generalize well to tomorrow’s data. For effective predictive modeling, you need sound ways to validate your model, and with a large model, you can easily get into trouble over-fitting. Large models should always be cross-validated, and JMP Pro does this through data partitioning, or holdback. The cross-validation technique helps you build models that generalize well to tomorrow’s data – about new customers, new processes or new risks – so you can make data-driven inferences about the future.

Dividing the data into training, validation and test data sets has long been used to avoid over-fitting, ensuring that the models you build are not reliant on the properties of the specific sample used to build them. The general approach to cross-validation in JMP Pro is to use a validation column. You can easily split your data into different sets for different purposes using the validation column utility (either with a purely random sample or stratified random).

The training set is used to build the model(s), the validation set is used in the model-building process to help choose how complex the model should be. Finally, the test set is held out completely from the model-building process and used to assess the quality of the model(s). For smaller data sets, k-fold cross-validation also can be used. This process helps you build models that generalize to new data effectively.

It is important to consider that obser­vational data can only take you so far. To truly understand cause and effect, many times you may wish to employ design of experiments (DOE). JMP provides world-class tools for optimal DOE in a form you can easily use.

Model Comparison

In the real world, some kinds of models fit well in certain situations but fit poorly in others. With JMP Pro, there are many ways to fit, and you need to find out which one is most appropriate in a given situation. A typical approach to model building is that you will try many different models: models with more or less complexity, models with or with­out certain factors/predictors, models built using different kinds of modeling methods or even averages of multiple models (ensemble models).

Each of these models will have common quality measures that can be used to assess the model: R2, misclassification rate, ROC curves, AUC, lift curves, etc.

Using model comparison in JMP Pro, you can compare all the saved predic­tion columns from various fits and pick the best combination of goodness of fit, parsimony and cross-validation. JMP Pro makes this comparison automatically. At the same time, you can interact with visual model profilers to see which important factors each model is picking up. Model comparison in JMP Pro makes it easy to compare multiple models at the same time, and also to do simple model averaging, if desired.

Formula Depot and Generate Scoring Code

Managing your models doesn’t have to be painful – the Formula Depot in JMP Pro organizes your work when dealing with many models. This central repository lets you store, profile, compare and selectively deploy JMP Pro models in C, SQL, SAS or other languages.

Now, when building multiple models, your data tables are no longer weighed down with numerous extra columns of prediction formulas needed to perform model comparison. The score code can be saved to the Formula Depot and applied to new data. The result is a central modeling hub for easy access to your models and simple deployment to other systems.

Connect to the Richness of SAS®

As one of the SAS offerings for predictive analytics and data mining, JMP Pro easily connects to SAS, expanding options and giving access to the unparalleled depth of SAS Analytics and data integration. With or without an active SAS connection, JMP Pro can output SAS code to score new data quickly and easily with models built in JMP.

Modern Modeling

Generalized regression is a class of new modeling techniques well suited to building better models, even with challenging data. It fits generalized linear models using regularized or penalized regression methods.

Standard estimation techniques break down when you have predictors that are strongly correlated or more predictors than observations. And when there are many correlated predictors (as is often the case in observational data), stepwise regression or other standard techniques can yield unsatisfactory results. Such models are often over-fit and generalize poorly to new data. But how do you decide which variables to cull before modeling – or, worse, how much time do you lose manually preprocessing data sets in preparation for modeling?

The Generalized Regression personality in Fit Model is an all-inclusive approach to doing regression. It’s a complete modeling framework from variable selection through model diagnostics to LS means comparisons, inverse prediction and profiling. And it’s only in JMP Pro.

The regularization techniques available within the Generalized Regression personality include Ridge, Lasso, adaptive Lasso, Elastic Net and the adaptive Elastic Net to help better identify X’s that may have explanatory power. Harnessing these techniques is as easy as using any other modeling personality in Fit Model – simply identify your response, construct model effects and pick the desired estimation and validation method. JMP automatically fits your data, performs variable selection when appropriate, and builds a predictive model that can be generalized to new data. You can also use a forward stepwise technique, perform Quantile regression or simple fit using maximum likelihood.

Finally, Generalized Regression gives options to choose the appropriate distribution for the response you are modeling, letting you model more diverse responses such as counts, data with many outliers, or skewed data. And like all the advanced modeling platforms in JMP Pro, you have your choice of cross-validation techniques.

Reliability Block Diagram

Often, you are faced with analyzing the reliability of a more complex analytical system – a RAID storage array with multiple hard drives, or an airplane with four engines, for example. With JMP, you have many tools to analyze the reliability of single components within those systems. But with JMP Pro, you can take the reliability of single components, build a complex system of multiple components and analyze the reliability of the entire system. Using the Reliability Block Diagram, you can easily design and fix weak spots in your system—and be better informed to prevent future system failures.

With this platform, you can easily perform what-if analyses by looking at different designs and comparing plots across multiple system designs. You can also determine the best places to add redundancy and decrease the probability of a system failure.

Repairable Systems Simulation

Some systems or components of complex systems are too costly to have offline for very long. Maintaining the integrity of these systems requires you to schedule repairs for system components or maximize the benefit realized by an unplanned outage by completing additional repairs while the system is unavailable. With JMP Pro, you can use the Repairable Systems Simulation to determine how long a system will be unavailable and answer key questions of how many repairable events to expect in a given period of time and how much a repair event will cost.

Covering Arrays

Covering arrays are used in testing applications where factor interactions may lead to failures and each experimental run may be costly. As a result, you need to design an experiment to maximize the probability of finding defects while also minimizing cost and time. Covering arrays let you do just that. JMP Pro lets you design an experiment to test deterministic systems and cover all possible combinations of factors up to a certain order of interactions.

And when there are combinations of factors that create implausible conditions, you can use the interactive Disallowed Combinations filter to automatically exclude these combinations of factor settings from the design.

One of the huge advantages of covering arrays in JMP Pro is that JMP Pro is a statistical analysis tool, not just a covering arrays design tool. You can do all sorts of statistical analyses in JMP Pro. For example, there is currently no other software for covering arrays design that also lets you analyze your data using generalized regression. This is a huge advantage of JMP Pro
over other covering array design tools.

JMP Pro is not just strictly a design tool; it also allows you to import any covering array design – generated by any software – and further optimize it and analyze the results. You can design the arrays yourself without having to rely on others to build experiments for you. Test smarter with covering arrays in JMP Pro.

Mixed Models

Mixed models contain both fixed effects and random effects in the analysis. These models let you analyze data that involves both time and space. For example, you might use mixed models in a study design where multiple subjects are measured at multiple times during the course of a drug trial, or in crossover designs in the pharmaceutical, manufacturing or chemical industries.

JMP Pro lets you fit mixed models to your data, letting you specify fixed, random and repeated effects; correlate groups of variables; and set up subject and continuous effects – all with an intuitive drag-and-drop interface.

In addition, you can now calculate the covariance parameters for a wide variety of correlation structures. Such examples include when the experimental units on which the data is measured can be grouped into clusters, and the data from a common cluster is correlated. Another example is when repeated measurements are taken on the same experimental unit, and these repeated measurements are correlated or exhibit variability that changes.

It is also easy to visually determine which, if any, spatial covariance structure is appropriate to utilize in your model specification when building mixed models in JMP Pro.

Uplift Models

You may want to maximize the impact of your limited marketing budget by sending offers only to individuals who are likely to respond favorably. But that task may seem daunting, especially when you have large data sets and many possible behavioral or demographic predictors. Here is where uplift models can help. Also known as incremental modeling, true-lift modeling or net modeling, uplift models have been developed to help optimize marketing decisions, define personalized medicine protocols or, more generally, to identify characteristics of individuals who are likely to respond to some action.

Uplift modeling in JMP Pro lets you make these predictions. JMP Pro fits partition models that find splits to maximize a treatment difference. The models help identify groups of individuals who are most likely to respond favorably to an action; they help to lead to efficient and targeted decisions that optimize resource allocation and impact on the individual.

Advanced Computational Statistics

JMP Pro includes exact statistical tests for contingency tables and exact non-parametric statistical tests for one-way ANOVA. Additionally, JMP Pro includes a general method for bootstrapping statistics in most JMP reports.

Bootstrapping approximates the sampling distribution of a statistic. JMP Pro is the only statistical software package that lets you bootstrap a statistic without writing a single line of code. One-click bootstrapping means you are only a click away from being able to bootstrap any quantity in a JMP report.

This technique is useful when textbook assumptions are in question or don’t exist. For example, try applying bootstrapping techniques to nonlinear model results that are being used to make predictions or determining coverage intervals around quantiles. Also, you can use bootstrapping as an alternative way to gauge the uncertainty in predictive models. Bootstrapping lets you assess the confidence in your estimates with fewer assumptions – and one-click bootstrapping in JMP Pro makes it easy.

Share and Communicate Results

Dow Chemical has adopted JMP Pro for its workforce because decision makers want the best tool available for exploring large data sets and efficiently extracting from them as much information as possible.

Read the story

JMP has always been about discovery and finding the best way of communicating those discoveries across your organization. JMP Pro includes all the visual and interactive features of JMP, making your data accessible in ways you might never have experienced. Through dynamically linked data, graphics and statistics, JMP Pro brings your investigation alive in a 3-D plot or an animated graph showing change over time, generating valuable new insights that inform both the model-building and explanation process.

Key Features Exclusive to JMP® Pro

JMP Pro includes all of the features in JMP, plus the additional capabilities for advanced analytics listed below.

Predictive Modeling and Cross-Validation

Neural Network Modeling
  • Automated handling of missing data.
  • Automatic selection of the number of hidden units using gradient boosting.
  • Fit both one- and two-layer neural networks.
  • Automated transformation of input variables.
  • Three activation functions (Hyperbolic Tangent, Linear, Gaussian).
  • Save randomly generated cross-validation columns.
  • Save transformed covariates.
  • Support for validation column.
Recursive Partition Modeling
  • Choice of methods: Decision tree, Bootstrap forest (a random-forest technique), Boosted tree, K nearest neighbors, Naive Bayes.
  • Set random seed, suppress multithreading, use tuning design table, stochastic gradient descent available in Boosted Trees and Bootstrap Forest.
  • Support for validation column.
  • Dedicated model launch options for: Bootstrap forest, Boosted tree, K nearest neighbors and Naive Bayes.
Model Comparison
  • Compare models built in JMP Pro.
  • Profiler.
  • Fit statistics (R2, Misclassification Rate, ROC curves, AUC, Lift Curves).
  • Model averaging.
Make Validation Column
  • Automatic partitioning of data into training, validation and test portions; creation of validation columns.
  • Formula random, fixed random, stratified random, grouped random, cutpoint methods to create the holdback sets.
  • Validation column creation from platform launch by clicking validation column role (Formula random only).
Formula Depot
  • Stores and manages Formula Column scripts.
  • Publish commands available for Discriminant, Fit Least Squares (7 commands), Fit Logistic (Nominal and Ordinal), Decision tree, Bootstrap forest, Boosted trees, Uplift, K nearest neighbors, Naive Bayes, Neural, Latent class analysis, Principal components (wide and sparse), Generalize regression, PLS, Gaussian process.
  • Generate Score Code: SAS (DS2), C, Python, JavaScript, SQL (with choice of syntax options for different destinations).
  • Direct comparison of models collected in the Formula Depot by using Model Comparison.
  • Profiler.
  • Show script, copy script, copy formula, copy formula as column transform, run script to generate formula column in the data table.
  • Add formulas from data table columns.

Text Explorer Analytics

  • Latent Class Analysis.
  • Latent Semantic Analysis (Sparse SVD).
  • Topic Analysis (Rotated SVD).
  • Cluster Terms and Documents.
  • SVD and Topic Scatterplot Matrix.
  • Save Columns: Document Singular and Topic Vectors, Stacked DTM for Association.
  • Save Formula: Singular Vector, Topic Vector.
  • Save Vectors: Term and Topic.

Reliability and Survival Models

Reliability Block Diagram (RBD)
  • Build models of complex system reliability.
  • Use basic, serial, parallel, knot, and K out of N nodes to build systems.
  • Build nested designs using elements from design library.
Repairable System Simulation (RSS)
  • Discrete event simulation based engine.
  • Support traditional maintenance: corrective maintenance and preventive maintenance, as off-the-shelf building blocks.
  • Introducing a diagrammatic representation of maintenance arrangement alongside of an RBD in a single workspace.
  • Diagrammatic linkages between event and action elements across components to convey the idea of grouped maintenance and maintenance dependencies.
Parametric Survival
  • Supports variables selection through a bridge to the Generalized Regression personality of Fit Model.
Generalized Regression
  • Handles censored data allowing you to do variable selection with survival/reliability data.
  • Support for Cox Proportional Hazards.
  • Supports Weibull, LogNormal, Exponential, Gamma, Normal and ZI family of Distributions.

Fit Model

Generalized Regression
  • Regularization techniques: Ridge, Lasso, adaptive Lasso, Double Lasso, Elastic Net, adaptive Elastic Net.
  • Forward selection and Two-Stage Forward Selection.
  • Quantile regression.
  • Handles censored data allowing you to do variable selection with survival/reliability data.
  • Cox Proportional Hazards.
  • Save simulation formula for use in the general simulation platform.
  • Normal, LogNormal, Weibull, Cauchy, exponential, Gamma, Beta, binomial, Beta binomial, Poisson, negative binomial distribution.
  • Zero inflated binomial, Beta binomial, Poisson, negative binomial, Gamma distribution.
  • Choice of validation methods: Validation column, KFold, holdback, leave-one-out, BIC, AICc, ERIC.
Stepwise Regression
  • Support for validation column.
Logistic Regression (Nominal and Ordinal)
  • Support for validation column.
Standard Least Squares
  • Support for validation column.
Partial Least Squares (PLS)
  • PLS personality in Fit Model supports continuous or categorical response; continuous or categorical factors, interactions and polynomial terms.
  • NIPALS-style missing value imputation.
  • Save randomly generated cross-validation columns.
  • A Standardize X option, which centers and scales individual variables that are included in a polynomial effect prior to applying the centering and scaling options.
  • Choice of validation methods: Validation column, KFold, holdback, leave-one-out.
Mixed Models
  • Specify fixed, random and repeated effects.
  • Correlate groups of variables, set up subject and continuous effects.
  • Choice of repeated covariance structure.
  • Variograms serve as a visual diagnostic to determine which, if any, spatial correlation structure is most appropriate.

Covering Arrays

  • Design and analyze covering arrays.
  • Optimize designs after they are created for further run reduction.
  • Use disallowed combinations filter to specify infeasible testing regions.
  • Import covering arrays created by other software; analyze coverage and optionally further optimize.

Multivariate Methods

Discriminant analysis
  • Support for validation column.

Specialized Models

Gaussian process
  • Ability to fit models with thousands of rows through fast GASP.
  • Add categorical variables to your Gaussian process models.

Consumer Research

Uplift Models
  • Decision tree method to identify the consumer segments most likely to respond favorably to an offer or treatment.
  • Incremental, true-lift, net modeling technique.
  • Support for validation column.
Choice Models
  • Support for Hierarchical Bayes in Choice.
  • Save subject estimates and Bayes Chain.
Association Analysis
  • Support for market basket analysis.
  • Analyze stacked document term matrix generated by the Text Explorer platform.

Advanced Computational Statistics

Oneway Analysis
  • Nonparametric exact tests.
Contingency Analysis
  • Exact measures of association.
General Bootstrapping
  • Bootstrap statistics in most reports in a single click.
General Simulation Functionality
  • Single-click simulate statistics in most reports.
  • Power calculations on almost anything.
  • Support for parametric bootstrapping.
  • Randomization testing.

System Requirements

JMP runs on Microsoft Windows and Mac OS.