Usage Note 35411: How do I determine when to stop splitting my tree to avoid overfitting in JMP?
JMP® does not use a stopping rule. Currently, the platform is purely
interactive. However there are several different methods that can help.
- The use of p-values for splitting in the Partition node was
implemented in JMP® Version 5.1. When Maximize Significance is chosen as the criterion, the candidate report includes a column titled LogWorth. Details are documented in a technical report (Monte Carlo Calibration of Distributions of Partition Statistics).
- K fold crossvalidation can be used. This method randomly assigns all the (nonexluded) rows to one of k groups and then calculates crossvalidation
statistics.
- Hold out samples can be used. The training/validation/test set approach is explained by Bishop (1995, p. 372). The training data set is
used to create the tree. The performance of the tree is compared by using an
independent validation data set and the tree with the smallest error is
selected. The tree should then be confirmed by measuring its performance on a
third independent set of data called a test set.
Bishop, Christopher M. 1995. Neural Networks for Pattern Recognition. New York: Oxford University Press.
Operating System and Release Information
| JMP Software | JMP software | Macintosh | 5.1 | | | |
| Microsoft Windows 95/98 | 5.1 | | | |
| Microsoft Windows 2000 Professional | 5.1 | | | |
| Microsoft Windows NT Workstation | 5.1 | | | |
| Microsoft Windows Server 2003 Standard Edition | 5.1 | | | |
| Microsoft Windows XP Professional | 5.1 | | | |
| Windows Millennium Edition (Me) | 5.1 | | | |
| Windows Vista | 5.1 | | | |
| Linux | 5.1 | | | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
| Date Modified: | 2009-06-12 11:16:04 |
| Date Created: | 2009-04-01 10:54:03 |