Scatterplots and other such graphs can help you visualize relationships between variables. Once you have visualized relationships, the next step is to analyze those relationships so that you can describe them numerically. That numerical description of the relationship between variables is called a model. Even more importantly, a model also predicts the average value of one variable (Y) from the value of another variable (X). The X variable is also called a predictor. Generally, this model is called a regression model.
With JMP, the Fit Y by X platform and the Fit Model platform creates regression models.
Relationship Types shows the four primary types of relationships.
This example uses the Companies.jmp data table, which contains financial data for 32 companies from the pharmaceutical and computer industries.
Scatterplot of Sales ($M) versus # Employ
To predict the sales revenue from the number of employees, fit a regression model. From the red triangle for Bivariate Fit, select Fit Line. A regression line is added to the scatterplot and reports are added to the report window.
Regression Line
2.
Select Rows > Exclude/Unexclude.
3.
Fit this model by selecting Fit Line from the red triangle menu for Bivariate Fit.
Comparing the Models
Using the results in Comparing the Models, the data analyst can make the following conclusions:
The prediction for the first model was $7499.68, so this model predicts a higher sales total by $1461.69.
This example uses the Companies.jmp data table, which contains financial data for 32 companies from the pharmaceutical and computer industries.
To answer this question, fit Profits ($M) by Type.
1.
Select Help > Sample Data Library and open Companies.jmp.
2.
If you still have the Companies.jmp sample data table open, you might have rows that are excluded or hidden. To return the rows to the default state (all rows included and none hidden), select Rows > Clear Row States.
3.
Select Analyze > Fit Y by X.
4.
Select Profits ($M) and click Y, Response.
5.
Select Type and click X, Factor.
6.
Profits by Company Type
2.
Select Rows > Exclude/Unexclude. The data point is no longer included in calculations.
3.
Select Rows > Hide/Unhide. The data point is hidden from all graphs.
4.
To re-create the plot without the outlier, select Script > Redo Analysis from the red triangle menu for Oneway Analysis. You can close the original Scatterplot window.
Updated Plot
Display Options > Mean Lines. This adds mean lines to the scatterplot.
Means and Std Dev. This displays a report that provides averages and standard deviations.
Mean Lines and Report
To perform the t-test, select Means/Anova/Pooled t from the red triangle for Oneway Analysis.
t Test Results
Use the confidence interval limits to determine how much difference exists in the profits of both types of companies. Look at the Upper CL Dif and Lower CL Dif values in t Test Results. The financial analyst concludes that the average profit of pharmaceutical companies is between $343 million and $926 million higher than the average profit of computer companies.
If you have categorical X and Y variables, you can compare the proportions of the levels within the Y variable to the levels within the X variable.
This example continues to use the Companies.jmp data table. In Comparing Averages for One Variable, a financial analyst determined that pharmaceutical companies have higher profits on average than do computer companies.
1.
Select Help > Sample Data Library and open Companies.jmp.
2.
If you still have the Companies.jmp data file open from the previous example, you might have rows that are excluded or hidden. To return the rows to the default state (all rows included and none hidden), select Rows > Clear Row States.
3.
Select Analyze > Fit Y by X.
4.
Select Size Co and click Y, Response.
5.
Select Type and click X, Factor.
6.
Company Size by Company Type
The Contingency Table contains information that is not applicable for this example. From the red triangle menu for Contingency Table deselect Total % and Col % to remove that information. Updated Contingency Table shows the updated table.
Updated Contingency Table
To answer this question, use the p-value from the Pearson test in the Tests report. See Company Size by Company Type. Since the p-value of 0.011 is less than the significance level of 0.05, the financial analyst concludes the following:
The percentages differ in the broader population.
The section Comparing Averages for One Variable, compared averages across the levels of a categorical variable. To compare averages across the levels of two or more variables at once, use the Analysis of Variance technique (or ANOVA).
1.
Select Help > Sample Data Library and open Companies.jmp.
2.
Select Graph > Graph Builder. The Graph Builder window appears.
3.
Click Profits ($M) and drag and drop it into the Y zone.
4.
Click Size Co and drag and drop it into the X zone.
5.
Click Type and drag and drop it into the Group X zone.
Graph of Company Profits
6.
Right-click on the outlier to select it, and then select Row Exclude. The point is removed, and the scale of the graph automatically updates.
Graph with Outlier Removed
1.
Return to the Companies.jmp sample data table that has the data point excluded. See Discovering the Relationship.
2.
Select Analyze > Fit Model.
3.
Select Profits ($M) and click Y.
4.
Select both Type and Size Co.
5.
Click the Macros button and select Full Factorial.
6.
7.
Select the Keep dialog open option.
Completed Fit Model Window
8.
Click Run. The report window shows the model results.
Note: For complete details about all of the Fit Model results, see the ­Fitting Linear Models book.
The Effect Tests report (see Effect Tests Report) shows the results of the statistical tests. There is a test for each of the effects included in the model on the Fit Model window: Type, Size Co, and Type*Size Co.
Effect Tests Report
First, look at the test for the interaction in the model: the Type*Size Co effect. Graph with Outlier Removed showed that the pharmaceutical companies appeared to have different profits between company sizes. However, the effect test indicates that there is no interaction between type and size as it relates to profit. The p-value of 0.218 is large (greater than the significance level of 0.05). Therefore, remove that effect from the model, and re-run the model.
2.
In the Construct Model Effects box, select the Type*Size Co effect and click Remove.
3.
Click Run.
Updated Effect Tests Report
The section Using Regression with One Predictor showed you how to build simple regression models consisting of one predictor variable and one response variable. Multiple regression predicts the average response variable using two or more predictor variables.
This example uses the Candy Bars.jmp data table, which contains nutrition information for candy bars.
Use multiple regression to predict the average response variable using these three predictor variables.
1.
Select Help > Sample Data Library and open Candy Bars.jmp.
2.
Select Graph > Scatterplot Matrix.
3.
Select Calories and click Y, Columns.
4.
Select Total fat g, Carbohydrate g, and Protein g, and click X.
5.
Scatterplot Matrix Results
Continue to use the Candy Bars.jmp sample data table.
1.
Select Analyze > Fit Model.
2.
Select Calories and click Y.
3.
Select Total Fat g, Carbohydrate g, and Protein g and click Add.
4.
Next to Emphasis, select Effect Screening.
Fit Model Window
5.
Click Run.
Actual by Predicted Plot
Another measure of model accuracy is the RSq value (which appears below the plot in Actual by Predicted Plot). The RSq value measures the percentage of variability in calories, as explained by the model. A value closer to 1 means a model is predicting well. In this example, the RSq value is 0.99.
Parameter Estimates Report
Prediction Profiler
Factor Values for the Milky Way