The Liver Cancer.jmp sample data table contains liver cancer Node Count values for 136 patients. It also includes measurements on six potentially related variables: BMI, Age, Time, Markers, Hepatitis, and Jaundice. These columns are described in Column Notes in the data table.
This example develops a prediction model for Node Count using the six predictors. Node Count is modeled using a Poisson distribution.
1.
Open the Liver Cancer.jmp sample data table.
2.
Select Analyze > Fit Model.
3.
Select Node Count from the Select Columns list and click Y.
4.
Select BMI through Jaundice and click Macros > Factorial to degree.
6.
From the Personality list, select Generalized Regression.
8.
Click Run.
9.
Select Lasso from the Estimation Method list.
10.
The Solution Path is shown in Solution Path for Lasso Fit with Nonzero Terms Highlighted. The paths for terms that have nonzero coefficients are highlighted. Think of the solution paths as moving from right to left across the plot, as the solutions shrink farther from the MLE. A number of terms have paths that shrink them to zero fairly early.
Solution Path for Lasso Fit with Nonzero Terms Highlighted
The Parameter Estimates for Centered and Scaled Predictors report (Parameter Estimates Report with Nonzero Terms Highlighted) shows the parameter estimates for the centered and scaled data. The Parameter Estimates for Original Predictors report shows the estimates for the uncentered and unscaled data. The eleven terms with nonzero parameter estimates are highlighted in both reports. These include interaction effects. In the data table, all six predictor columns are selected because every predictor column appears in a term that has a nonzero coefficient.
12.
Click on the row for (Age - 56.3994)*Markers[0] in either the Parameter Estimates for Centered and Scaled Predictors or the Parameter Estimates for Original Predictors report.
Parameter Estimates Report with Nonzero Terms Highlighted
13.
From the Adaptive Lasso with Validation Column Validation report’s red triangle menu, select Save Columns > Save Prediction Formula and Save Columns > Save Variance Formula.
Two columns are added to the data table: Node Count Prediction Formula and Node Count Variance.
14.
Right-click either column header and select Formula to view the formula. Alternatively, click on the plus sign to the right of the column name in the Columns panel.
The prediction formula in the Save Prediction Formula column applies the exponential function to the estimated linear part of the model. The prediction variance formula in Node Count Variance is given by the identical formula, because the variance of a Poisson distribution equals its mean.
1.
Open the Liver Cancer.jmp sample data table.
2.
Select Analyze > Fit Model.
3.
Select Severity from the Select Columns list and click Y.
4.
Select BMI through Jaundice and click Macros > Factorial to degree.
All terms up to degree 2 (the default in the Degree box) are added to the model.
6.
From the Personality list, select Generalized Regression.
8.
Click Run.
9.
Elastic Net is selected as the Estimation Method. Click Go.
The Solution Path is shown in Solution Path Plot. The paths for terms that have nonzero coefficients are shown in blue. The optimal parameter values are substantially shrunken away from the MLE.
Solution Path Plot
The Effect Tests report also shows that only one term, Time, is significant at the 0.05 level. However, the Time*Markers interaction has a small p-value of 0.0665.
11.
To see how Time and the Time*Markers interaction affect Severity, select Profiler from the Adaptive Elastic Net report’s red triangle menu.
Profiler for Probability That Severity = High, Time Low
12.
Move the red dashed line for Time from left to right to see its interaction with Markers (Profiler for Probability That Severity = High, Time Low and Profiler for Probability That Severity = High, Time High). For patients who enter the study with small values of Time since diagnosis, Markers have little impact on Severity. But for patients who enter the study having been diagnosed for a longer time, Markers are important. For those patients, normal markers suggest a lower probability of high Severity.
Profiler for Probability That Severity = High, Time High
The Fishing.jmp sample data table contains fictional data for a study of various factors that affect the number of fish caught by groups visiting a park. The data table contains 250 responses from families or groups of traveling companions. This example models the number of Fish Caught as a function of Live Bait, Fishing Poles, Camper, People, and Children. These columns are described in Column Notes in the data table.
The data table contains a hidden column called Fished. During data collection, it was never determined whether anyone in the group had actually fished. However, the Fished column is included in the table to emphasize the point that catching zero fish can happen in one of two ways: Either no one in the group fished, or everyone who fished in the group was unlucky.
1.
Open the Fishing.jmp sample data table.
2.
Select Analyze > Fit Model.
3.
Select Fish Caught from the Select Columns list and click Y.
4.
Select Live Bait through Children and click Macros > Factorial to degree.
Terms up to degree 2 (the default in the Degree box) are added to the model.
6.
From the Personality list, select Generalized Regression.
8.
Click Run.
9.
Parameter Estimates for Centered and Scaled Predictors Report
The Effect Tests report indicates that four terms are significant at the 0.05 level: Live Bait, Fishing Poles, Fishing Poles*Camper, and Fishing Poles*Children.
10.
Select Profiler from the red triangle menu for the Adaptive Elastic Net with Validation Column Validation report.
A function is imposed on the response, which indicates that maximizing the number of Fish Caught is desirable. (See the Profilers book for more information about desirability functions.)
The Profiler, with settings that maximize the number of fish caught, is shown in Prediction Profiler with Fish Caught Maximized. You can vary the settings to see the impact of the significant effects: Live Bait, Fishing Poles, Fishing Poles*Camper, and Fishing Poles*Children. For example, live bait is associated with more fish; campers tend to bring more fishing poles than non-campers and, therefore, catch more fish.
Prediction Profiler with Fish Caught Maximized
13.
From the Adaptive Elastic Net with Validation Column Validation report’s red triangle menu, select Save Columns > Save Prediction Formula and Save Columns > Save Variance Formula.
Two columns are added to the data table: Fish Caught Prediction Formula and Fish Caught Variance.
14.
Right-click either column header and select Formula to view the formula. Alternatively, click the plus sign to the right of the column name in the Columns panel. Note the appearance of the estimated zero-inflation parameter, 0.7864579, in both of these formulas.