Variations of partitioning go by many names and brand names: decision trees, CARTTM, CHAIDTM, C4.5, C5, and others. The technique is often taught as a data mining technique because:
A classic application is where you want to turn a data table of symptoms and diagnoses of a certain illness into a hierarchy of questions. These question help diagnose new patients more quickly.
The factor columns (X’s) can be either continuous or categorical (nominal or ordinal). If an X is continuous, then the splits (partitions) are created by a cutting value. The sample is divided into values below and above this cutting value. If the X is categorical, then the sample is divided into two groups of levels.
The response column (Y) can also be either continuous or categorical (nominal or ordinal). If Y is continuous, then the platform fits means. If Y is categorical, then the fitted value is a probability. In either case, the split is chosen to maximize the difference in the responses between the two branches of the split.