Scatterplots and other such graphs can help you visualize relationships between variables. Once you have visualized relationships, the next step is to analyze those relationships so that you can describe them numerically. That numerical description of the relationship between variables is called a model. Even more importantly, a model also predicts the average value of one variable (Y) from the value of another variable (X). The X variable is also called a predictor. Generally, this model is called a regression model.
With JMP, the Fit Y by X platform and the Fit Model platform creates regression models.
Relationship Types shows the four primary types of relationships.
 • Using Regression with One Predictor
 • Using Regression with Multiple Predictors
 • Comparing Averages for One Variable
 • Comparing Averages for Multiple Variables
This example uses the Companies.jmp data table, which contains financial data for 32 companies from the pharmaceutical and computer industries.
 • Discovering the Relationship
 • Fitting the Regression Model
 • Predicting Average Sales
Scatterplot of Sales (\$M) versus # Employ
To predict the sales revenue from the number of employees, fit a regression model. From the red triangle for Bivariate Fit, select Fit Line. A regression line is added to the scatterplot and reports are added to the report window.
Regression Line
 • the p-value of <.0001
 • the RSquare value of 0.618
 • The p-value is less than the significance level of 0.05. Therefore, including the number of employees in the prediction model significantly improves the ability to predict average sales.
 • Since the RSquare value in this example is large, this confirms that a prediction model based on the number of employees can predict sales revenue. The RSquare value shows the strength of a relationship between variables, also called the correlation. A correlation of 0 indicates no relationship between the variables, and a correlation of 1 indicates a perfect linear relationship.
 1 Click on the outlier.
 2 Select Rows > Exclude/Unexclude.
 3 Fit this model by selecting Fit Line from the red triangle menu for Bivariate Fit.
 • a new regression line
 • a new Linear Fit report, which includes:
 ‒ a new prediction equation
 ‒ a new RSquare value
Comparing the Models
Using the results in Comparing the Models, the data analyst can make the following conclusions:
 • The outlier was pulling down the regression line for the larger companies, and pulling the line up for the smaller companies.
 • The new model fits the data better, since the new RSquare value (0.88) is closer to 1 than the first RSquare value (0.618).
The prediction for the first model was \$7499.68, so this model predicts a higher sales total by \$1461.69.
This example uses the Companies.jmp data table, which contains financial data for 32 companies from the pharmaceutical and computer industries.
 • How do the profits of computer companies compare to the profits of pharmaceutical companies?
To answer this question, fit Profits (\$M) by Type.
 1 Select Help > Sample Data Library and open Companies.jmp.
 2 If you still have the Companies.jmp sample data table open, you might have rows that are excluded or hidden. To return the rows to the default state (all rows included and none hidden), select Rows > Clear Row States.
 3 Select Analyze > Fit Y by X.
 4 Select Profits (\$M) and click Y, Response.
 5 Select Type and click X, Factor.
 6 Click OK.
Profits by Company Type
 1 Click on the outlier.
 2 Select Rows > Exclude/Unexclude. The data point is no longer included in calculations.
 3 Select Rows > Hide/Unhide. The data point is hidden from all graphs.
 4 To re-create the plot without the outlier, select Script > Redo Analysis from the red triangle menu for Oneway Analysis. You can close the original Scatterplot window.
Updated Plot
 5 To continue analyzing the relationship, select these options from the red triangle menu for Oneway Analysis:
 ‒ Display Options > Mean Lines. This adds mean lines to the scatterplot.
 ‒ Means and Std Dev. This displays a report that provides averages and standard deviations.
Mean Lines and Report
 • Does a difference exist in the broader population, or is the difference of \$635 million due to chance?
 • If there is a difference, what is it?
To perform the t-test, select Means/Anova/Pooled t from the red triangle for Oneway Analysis.
t Test Results
Use the confidence interval limits to determine how much difference exists in the profits of both types of companies. Look at the Upper CL Dif and Lower CL Dif values in t Test Results. The financial analyst concludes that the average profit of pharmaceutical companies is between \$343 million and \$926 million higher than the average profit of computer companies.
If you have categorical X and Y variables, you can compare the proportions of the levels within the Y variable to the levels within the X variable.
This example continues to use the Companies.jmp data table. In Comparing Averages for One Variable, a financial analyst determined that pharmaceutical companies have higher profits on average than do computer companies.
 1 Select Help > Sample Data Library and open Companies.jmp.
 2 If you still have the Companies.jmp data file open from the previous example, you might have rows that are excluded or hidden. To return the rows to the default state (all rows included and none hidden), select Rows > Clear Row States.
 3 Select Analyze > Fit Y by X.
 4 Select Size Co and click Y, Response.
 5 Select Type and click X, Factor.
 6 Click OK.
Company Size by Company Type
The Contingency Table contains information that is not applicable for this example. From the red triangle menu for Contingency Table deselect Total % and Col % to remove that information. Updated Contingency Table shows the updated table.
Updated Contingency Table
To answer this question, use the p-value from the Pearson test in the Tests report. See Company Size by Company Type. Since the p-value of 0.011 is less than the significance level of 0.05, the financial analyst concludes the following:
 • The differences in the sample data are not due to chance alone.
 • The percentages differ in the broader population.
The section Comparing Averages for One Variable, compared averages across the levels of a categorical variable. To compare averages across the levels of two or more variables at once, use the Analysis of Variance technique (or ANOVA).
 • Type (pharmaceutical or computer)
 • Size (small, medium, big)
 1 Select Help > Sample Data Library and open Companies.jmp.
 2 Select Graph > Graph Builder. The Graph Builder window appears.
 3 Click Profits (\$M) and drag and drop it into the Y zone.
 4 Click Size Co and drag and drop it into the X zone.
 5 Click Type and drag and drop it into the Group X zone.
Graph of Company Profits
 6 Right-click on the outlier to select it, and then select Row Exclude. The point is removed, and the scale of the graph automatically updates.
 7 Click on the Bar icon. Comparing mean profits is easier with bar charts than with points.
Graph with Outlier Removed
 • if the differences are limited to this sample and due to chance
 • if the same patterns exist in the broader population
 1 Return to the Companies.jmp sample data table that has the data point excluded. See Discovering the Relationship.
 2 Select Analyze > Fit Model.
 3 Select Profits (\$M) and click Y.
 4 Select both Type and Size Co.
 5 Click the Macros button and select Full Factorial.
 6 From the Emphasis menu, select Effect Screening.
 7 Select the Keep dialog open option.
Completed Fit Model Window
 8 Click Run. The report window shows the model results.
Note: For complete details about all of the Fit Model results, see the ­Fitting Linear Models book.
The Effect Tests report (see Effect Tests Report) shows the results of the statistical tests. There is a test for each of the effects included in the model on the Fit Model window: Type, Size Co, and Type*Size Co.
Effect Tests Report
First, look at the test for the interaction in the model: the Type*Size Co effect. Graph with Outlier Removed showed that the pharmaceutical companies appeared to have different profits between company sizes. However, the effect test indicates that there is no interaction between type and size as it relates to profit. The p-value of 0.218 is large (greater than the significance level of 0.05). Therefore, remove that effect from the model, and re-run the model.
 1 Return to the Fit Model window.
 2 In the Construct Model Effects box, select the Type*Size Co effect and click Remove.
 3 Click Run.
Updated Effect Tests Report
 • There is a real difference in profits between computer and pharmaceutical companies in the broader population.
 • There is no correlation between the company’s size and type and its profits.
The section Using Regression with One Predictor showed you how to build simple regression models consisting of one predictor variable and one response variable. Multiple regression predicts the average response variable using two or more predictor variables.
This example uses the Candy Bars.jmp data table, which contains nutrition information for candy bars.
 • Total fat
 • Carbohydrates
 • Protein
Use multiple regression to predict the average response variable using these three predictor variables.
 1 Select Help > Sample Data Library and open Candy Bars.jmp.
 2 Select Graph > Scatterplot Matrix.
 3 Select Calories and click Y, Columns.
 4 Select Total fat g, Carbohydrate g, and Protein g, and click X.
 5 Click OK.
Scatterplot Matrix Results
Continue to use the Candy Bars.jmp sample data table.
 1 Select Analyze > Fit Model.
 2 Select Calories and click Y.
 3 Select Total Fat g, Carbohydrate g, and Protein g and click Add.
 4 Next to Emphasis, select Effect Screening.
Fit Model Window
 5 Click Run.
 • Using the Actual by Predicted Plot
 • Interpreting the Parameter Estimates
 • Using the Prediction Profiler
Actual by Predicted Plot
Another measure of model accuracy is the RSq value (which appears below the plot in Actual by Predicted Plot). The RSq value measures the percentage of variability in calories, as explained by the model. A value closer to 1 means a model is predicting well. In this example, the RSq value is 0.99.
 • The model coefficients
 • P-values for each parameter
Parameter Estimates Report
 • Fat = 11 g
 • Carbohydrate = 43 g
 • Protein = 2 g
Prediction Profiler
Factor Values for the Milky Way