The Lipid Data.jmp data table contains blood measurements, physical measurements, and questionnaire data from 95 subjects at a California hospital. You are interested in using a validation column as a way of validation during future analyses.
1.
Select Help > Sample Data Library and open Lipid Data.jmp.
2.
Select Analyze > Distribution.
3.
Assign Gender to the Y, Columns role. Click OK.
Distribution of Gender in Lipid Data.jmp
Distribution of Gender in Lipid Data.jmp illustrates the distribution of Gender in the data set. Notice that there is not an equal proportion of males and females represented. Because there is a scarcity of females within the data, you want to be sure to balance the genders across the validation and training sets.
4.
Select Cols > Modeling Utilities > Make Validation Column.
5.
Click Stratified Random.
6.
Select Gender as the column used for validation holdback.
7.
8.
Select Analyze > Fit Y by X.
9.
Assign Validation to Y, Response, and Gender to the X, Factor.
10.
Distribution of Gender across Validation and Training Sets
Distribution of Gender across Validation and Training Sets illustrates the distribution of Gender across each of the validation and training sets. Note that about 75% of both females and males are in the training set and about 25% of both females and males are in the validation set.
Select Cols > Modeling Utilities > Make Validation Column.
Click Validation in a platform launch window. Note the following:
Partitions the data into sets based on balancing values from specific columns. Using the Stratified Random option adds a column Notes property to the new Validation column in the data table. The Notes indicate how the data were stratified to generate the Validation column. Use this option when you want equal representation of values from a column in each of the training, validation, and testing sets.