The graph for goodness of fit depends on which type of response you use. The Actual by Predicted plot is for continuous responses, and the ROC Curve and Lift Curve are for categorical responses.
For continuous responses, the Actual by Predicted plot shows how well the model fits the data. Each leaf is predicted with its mean, so the x-coordinates are these means. The actual values form a scatter of points around each leaf mean. A diagonal line represents the locus of where predicted and actual values are the same. For a perfect fit, all the points would be on this diagonal. When validation is used, plots are shown for both the training and the validation sets. See Actual by Predicted Plots for Boston Housing Data.
When you fit a Decision Tree, observations in a leaf have the same predicted value. If there are n leaves, then the Actual by Predicted plot shows at most n distinct predicted values. This gives the plot the appearance of having points arranged on vertical lines. Each of these lines corresponds to a predicted value for some leaf.
The ROC curve is for categorical responses. The classical definition of ROC curve involves the count of True Positives by False Positives as you accumulate the frequencies across a rank ordering. The True Positive y-axis is labeled “Sensitivity” and the False Positive X-axis is labeled “1-Specificity”. If you slide across the rank ordered predictor and classify everything to the left as positive and to the right as negative, this traces the trade-off across the predictor's values.
To generalize for polytomous cases (more than 2 response levels), Partition creates an ROC curve for each response level versus the other levels. If there are only two levels, one is the diagonal reflection of the other, representing the different curves based on which is regarded as the “positive” response level.
ROC curves are nothing more than a curve of the sorting efficiency of the model. The model rank-orders the fitted probabilities for a given Y-value. Starting at the lower left corner, the curve is drawn up when the row comes from that category and to the right when the Y is another category.
In the following picture, the Y axis shows the number of Ys where Y=1, and the X axis shows the number of Ys where Y=0.
If the model perfectly rank-orders the response values, then the sorted data has all the targeted values first, followed by all the other values. The curve moves all the way to the top before it moves at all to the right.
If the model does not predict well, it wanders more or less diagonally from the bottom left to top right.
In practice, the curve lifts off the diagonal. The area under the curve is the indicator of the goodness of fit, with 1 being a perfect fit.
If a partition contains a section that is all or almost all one response level, then the curve lifts almost vertically at the left for a while. This means that a sample is almost completely sensitive to detecting that level. If a partition contains none or almost none of a response level, the curve at the top crosses almost horizontally for a while. This means that there is a sample that is almost completely specific to not having that response level.
Because partitions contain clumps of rows with the same (that is tied) predicted rates, the curve actually goes slanted, rather than purely up or down.
For polytomous cases, you get to see which response categories lift off the diagonal the most. In the CarPoll example above, the European cars are being identified much less than the other two categories. The American's start out with the most sensitive response (Size(Large)) and the Japanese with the most negative specific (Size(Large)'s small share for Japanese).
A lift curve shows the same information as an ROC curve, but in a way to dramatize the richness of the ordering at the beginning. The Y-axis shows the ratio of how rich that portion of the population is in the chosen response level compared to the rate of that response level as a whole. For example, the top-rated 10% of fitted probabilities might have a 25% richness of the chosen response compared with 5% richness over the whole population. Then the lift curve goes through the X-coordinate of 0.10 at a Y-coordinate of 25% / 5%, or 5. All lift curves reach (1,1) at the right, as the population as a whole has the general response rate.
When the response rate for a category is very low anyway (for example, a direct mail response rate), the lift curve explains things with more detail than the ROC curve.