Data Mining and Predictive Modeling

Presenter: Jason Wiggins

Introduction to Data Mining and Predictive Modeling

See how to:

  • Understand the goal of a window manufacturing case study: to determine settings that will reduce breakage 
  • Use Distribution to examine the relationship between breakage rate and other variables (8:00)
  • Use Fit Y by X to make simple statistical comparisons between breakage rate and variables (9:14)
    • Uncover relationships using Fit Group, fitting a line to see the relationship between 2 continuous variables, and performing 1-way Analysis of Variance (ANOVA) to examine relationship between categorical and continuous responses.
  • Use Graph Builder to examine relationship of variables by fitting different kinds of regression lines (quadratic, cubic) and then comparing the R-squared values to see how much of the variability is explained (10:43)
  • Use Prediction Profiler to interactively change factor settings, examine relationships of changes to other factors and determine which factors have most impact on response (14:16)
    • Build Response Surface model to examine main effects, 2-factor interactions and cubic relationships
    • Use Stepwise Regression Standard Least Squares to interactively add and remove terms to find the best model
    • Save model prediction formula to Data Table
  • Use Partition to build and compare linear models (21:15)
  • Use JMP Pro and cross-validation to build, compare and locate best predictive model (26:44)
    • Segment data into 3 sets: Training set estimates model parameters, Validation set assesses model's predictive ability by getting best or worse as terms are added, Test set is left out of model-building and gives the final, independent assessment of model predictive ability
    • Use Generalized Regression to create model (28:39) 
    • Interactively create Validation column for cross-validation to avoid overfitting the model (overestimating predictive ability) (29:00)
    • Use Bootstrap Forest and Boosted Trees (Decision Trees) (34:25)
    • Build a Neural Network model with 2 layers and enough nodes to describe relationship (36:00)
      • Examine diagram to see connections between inputs and nodes
    • Use JMP Pro to compare models and uncover potential overfitting
    • Use JMP Pro to save candidate models and the formulas describing them to Formula Depot for further model profiling and comparison and crating scoring code to run models in C, Python, JavaScript, SAS or SQL non-JMP environments (40:38)

Note: Q&A is included at times 24:56, 25:58 and 44:36.

Resources mentioned in these videos


    Back to Top