The Uplift Model Report

The report opens by showing the Graph and the initial node of the Tree, as well as controls for splitting.

The graph represents the response on the vertical axis. The horizontal axis corresponds to observations, arranged by nodes. For each node, a black horizontal line shows the mean response. Within each split, there is a subsplit for treatment shown by a red or blue line. These lines indicate the mean responses for each of the two treatment groups within the split. The value ordering of the treatment column determines the placement order of these lines. As nodes are split, the graph updates to show the splits beneath the horizontal axis. Vertical lines divide the splits.

Beneath the graph are the control buttons: Split, Prune, and Go. The Go button only appears if there is a validation set. Also shown is the name of the Treatment column and its two levels, called Treatment1 and Treatment2. If more than two levels are specified for the Treatment column, all but the first level are treated as a single level and combined into Treatment2.

To the right of the Treatment column information is a report showing summary values relating to prediction. (Keep in mind that prediction is not the objective in uplift modeling.) The report updates as splitting occurs. If a validation set is used, values are shown for both the training and the validation sets.

RSquare

The RSquare for the regression model associated with the tree. Note that the regression model includes interactions with the treatment column.

RMSE

The root mean square error (RMSE) for the regression model associated with the tree. RMSE is only given for continuous responses. For more details, see Fitting Linear Models.

The number of observations.

Number of Splits

The number of times splitting has occurred.

AICc

The Corrected Akaike Information Criterion (AICc), computed using the associated regression model. AICc is only given for continuous responses. For more details, see Fitting Linear Models.

Uplift Decision Tree

The decision tree shows the splits used to model uplift. See Nodes for First Split for an example using the Hair Care Product.jmp sample data table. Each node contains the following information:

Treatment

The name of the treatment column is shown, with its two levels.

Rate

Only appears for two-level categorical responses. For each treatment level, the proportion of subjects in this node who responded.

Mean

Only appears for continuous responses. For each treatment level, the mean response for subjects in this node.

Count

The number of subjects in this node in the specified treatment level.

t Ratio

The t ratio for the test for a difference in response across the levels of Treatment for subjects in this node. If the response is categorical, it is treated as continuous (values 0 and 1) for this test.

Trt Diff

The difference in response means across the levels of Treatment. This is the uplift, assuming that:

‒	The first level in the treatment column’s value ordering represents the treatment.

‒	The response is defined so that larger values reflect greater impact.

LogWorth

The value of the logworth for the subsequent split based on the given node.

Nodes for First Split

Candidates Report

Each node also contains a Candidates report. This report gives:

Term

The model term.

LogWorth

The maximum logworth over all possible splits for the given term. The logworth corresponding to a split is -log10 of the adjusted p-value.

F Ratio

When the response is continuous, this is the F Ratio associated with the interaction term in a linear regression model. The regression model specifies the response as a linear function of the treatment, the binary split, and their interaction. When the response is categorical, this is the ChiSquare value for the interaction term in a nominal logistic model.

Gamma

When the response is continuous, this is the coefficient of the interaction term in the linear regression model used in computing the F ratio. When the response is categorical, this is an estimate of the interaction constructed from Firth-adjusted log-odds ratios.

Cut Point

If the term is continuous, this is the point that defines the split. If the term is categorical, this describes the first (left) node.

Uplift Report Options

With the exception of the options described below, all of the red triangle options for the Uplift report are described in the documentation for the Partition platform. For details about these options, see the Partition Models chapter in the Specialized Models book.

Minimum Size Split

This option presents a window where you enter a number or a fractional portion of the total sample size to define the minimum size split allowed. To specify a number, enter a value greater than or equal to 1. To specify a fraction of the sample size, enter a value less than 1. The default value for the Uplift platform is set to 25 or the floor of the number of rows divided by 2,000, whichever value is greater.

Column Uplift Contributions

This table and plot address a column’s contribution to the uplift tree structure. A column’s contribution is computed as the sum of the F Ratio values associated with its splits. Recall that these values measure the significance of the treatment-by-split interaction term in the linear regression model.

Uplift Graph

Consider the observations in the training set. Define uplift for an observation as the difference between the predicted probabilities or means across the levels of Treatment for the observation’s terminal node. These uplift values are sorted in descending order. On its vertical axis, the Uplift Graph shows the uplift values. On its horizontal axis, the graph shows the proportion of observations with each uplift value.

See Uplift Graph for an example of an Uplift Graph for the Hair Care Product.jmp sample data table after three splits. Note that, for two groups of subjects (males and non-blond women in the Age ≥ 42 group), the promotion has a negative effect.

The horizontal lines shown on the Uplift Graph delineate the graph for the validation set. Specifically, the decision tree is evaluated for the validation set and the Uplift Graph is constructed from the estimated uplifts.

Uplift Graph

Save Columns

Save Difference

Saves the estimated difference in mean responses across levels of Treatment for the observation’s node. This is the estimated uplift.

Save Difference Formula

Saves the formula for the Difference, or uplift.