Introduction to Predictive Modeling
What is predictive modeling?
Predictive modeling, or predictive analytics, is about using data and statistical algorithms to predict what might happen next, given the current process and environment. It is part of the descriptive, predictive, and prescriptive analytical spectrum.
What are some widely used predictive modeling methods?
Predictive models fall into two general categories: supervised and unsupervised. With supervised methods, you are interested in predicting values of an output variable based on a collection of input variables. Methods for supervised learning include multiple linear regression, logistic regression, decision trees, neural networks, and others.
With unsupervised methods, you study a collection of variables with no known or observed response variables. Methods for unsupervised learning include principal components analysis, cluster analysis, factor analysis, and others.
Below are some of the more frequently used predictive models:
Supervised learning, continuous response
- Multiple linear regression
- Penalized regression
- Decision trees
- Neural networks
- Support vector regression
- K nearest neighbors
Supervised learning, categorical response
- Logistic regression
- Penalized logistic regression
- Decision trees
- Neural networks
- Support vector machines
- K nearest neighbors
- Discriminant analysis
- Naïve Bayes classification
Unsupervised learning
- Principal components analysis
- Hierarchical clustering
- K-means clustering
- Association analysis/market basket analysis
- Factor analysis
Example data set for predictive modeling
Other pages in this section discuss predictive modeling techniques. The data they use are described here.
Let’s say that you own a process that recovers a chemical substance from a substrate. You would like to find process settings that maximize the yield (a continuous response variable) and maximize the quality (a categorical response variable) of the substance recovered. You measure many process variables for each batch, both continuous and categorical.
If you have JMP on your computer, you can download the JMP data set Recovery.jmp for your own analysis as you go through these predictive modeling pages. (If you don’t have access to JMP, download a free trial here.)
You can fit a main effects model to the response Percent Recovered, using a validation set for honest assessment.