Usage Note 35411: How do I determine when to stop splitting my tree to avoid overfitting in JMP?
JMP® does not use a stopping rule. Currently, the platform is purely interactive. However there are several different methods that can help.
- The use of p-values for splitting in the Partition node was implemented in JMP® Version 5.1. When Maximize Significance is chosen as the criterion, the candidate report includes a column titled LogWorth. Details are documented in a technical report (Monte Carlo Calibration of Distributions of Partition Statistics).
- K fold crossvalidation can be used. This method randomly assigns all the (nonexluded) rows to one of k groups and then calculates crossvalidation statistics.
- Hold out samples can be used. The training/validation/test set approach is explained by Bishop (1995, p. 372). The training data set is used to create the tree. The performance of the tree is compared by using an independent validation data set and the tree with the smallest error is selected. The tree should then be confirmed by measuring its performance on a third independent set of data called a test set.
Bishop, Christopher M. 1995. Neural Networks for Pattern Recognition. New York: Oxford University Press.
Operating System and Release Information
|Product Family||Product||System||Product Release||SAS Release|
|JMP Software||JMP software||Macintosh||5.1|
|Microsoft Windows 95/98||5.1|
|Microsoft Windows 2000 Professional||5.1|
|Microsoft Windows NT Workstation||5.1|
|Microsoft Windows Server 2003 Standard Edition||5.1|
|Microsoft Windows XP Professional||5.1|
|Windows Millennium Edition (Me)||5.1|
|Date Modified:||2009-06-12 11:16:04|
|Date Created:||2009-04-01 10:54:03|