Life Sciences > Marker Simulation > Launch the Marker Simulation Platform
Publication date: 07/15/2025

Launch the Marker Simulation Platform

To simulate a genetic cross from your experimental data, launch the Marker Simulation platform by selecting Analyze > Life Sciences > Marker Simulation.

Figure 4.3 Marker Simulation Launch Window 

Marker Simulation Launch Window

Marker

Select the marker columns that you want and click Marker to specify the markers that you want to analyze.

Predictor Formula

Use this option to specify columns that contain the predictor formulas. These formulas are developed on historical data where an event has been measured or inferred. They are generated by using one or more predictive modeling processes that use predictive platforms in JMP (for example, Fit Model, Response Screening, XGBoost, and so on). The predictive models are then applied to new data for which the attributes are known, but the event has not yet occurred. See Generating Predictor Formulas for Marker Simulation.

Note: Any trait column that lacks a corresponding Predictor Formula column is ignored during the simulation.

Cross

Use this option to specify the column to be used to differentiate the parents in the crosses. For example, specifying Sex directs the platform to cross parents of different sex (male with female) only.

Sample ID

Use this option to specify one or more variables whose values can, either singly or in combination, provide a unique identifier for each row.

By

Produces a separate report for each level of the By variable. If more than one By variable is assigned, a separate report is produced for each possible combination of the levels of the By variables.

Ploidy

Enables you to specify the ploidy level of the experimental organism under investigation. Note that this must be an even number

Number of Individuals per Cross

Enables you to specify the number of replicates.

Number of Generations

Enables you to specify the number of generations.

Use Annotation Table

Enables you to access annotation information that is contained in a separate data table. After you click OK, you are prompted to specify the name and location of the annotation table.

Use Only Markers Found in Predictor Formula

Check this box to restrict the simulation to only those markers that are used to develop the predictor formulas.Typically, the algorithms that are used to generate the predictor formulas use a variable selection method to select a subset of the most significant markers in your data set. To see the markers that are used, right-click the column that lists the trait predictors and select Column Info.

Estimate Diversity

Select this box to calculate estimates of polymorphism, heterozygosity, and allelic diversity and frequency for the progeny of each cross.

Missing Marker Imputation Method

Use this option to specify how missing marker values are to be imputed. Because the Marker Simulation method does not run when your data is missing marker data, you must impute any missing data.

Select HWE Off to impute the missing genotypes with random draws from a multinomial distribution in which the frequency of each genotype class is set to be the observed frequency from the data.

Select HWE On to impute the missing genotypes with random draws from a multinomial distribution in which the frequency of each genotype class is set to be the expected frequency under the assumption of the Hardy-Weinberg equilibrium (HWE).

Select Random to randomly assign one of the acceptable values (0, 1, 2, ..., K (where K is the ploidy level)).

Select Specified to impute the missing genotypes with a specified integer between zero and the ploidy number.

Imputation Value

Use this option to specify a value to use in place of any missing genotypes.

To impute with recessive, dominant, or heterozygous, select Specified, and then you can enter a number from 0 to ploidy in the Imputation Value box. For diploid organisms, enter 0 for Recessive Homozygous, 1 for Heterozygous, and 2 for Dominant Homozygous Both assuming diploid.

Note: This option is available only when Specified is chosen as the Missing Marker Imputation Method.

Select Best Individuals

Check this box to select only the progeny that meet specified trait criteria in each generation for use in the subsequent cross. You must specify the selection criteria for each trait that is used for the selection. You can specify a lower limit, an upper limit, or a specific target value.

Specify a lower limit to select progeny with a trait value that is larger than or equal to this limit to move to next generation. Specify an upper limit to select progeny with a trait value that is lower than or equal to this limit to move to next generation. Specify a target value to select progeny with a trait value that is equal to this target to move to next generation.

Note: Specification of a target value is done when traits are non-continuous.

You can specify both an upper limit and a lower limit for any given trait to select only the progeny with trait values that fall within the interval formed by the upper and lower limits. Specification of a target value together with either an upper or a lower limit is not valid.

The final selection criterion is the intersection of all criteria that are specified for the traits. For example, if Spec Limits are such that L1<= Trait1, L2 <= Trait2 <= U2, and Trait3 == T3, then the selection criterion is constructed to be L1<= Trait1 and L2 <= Trait2 <= U2 and Trait3 == T3. Any progeny that satisfies this criterion is selected to the next generation.

For details about how to specify criteria for selecting progeny, see Specifying Trait Selection Criteria for Marker Simulation.

Note: This option is ignored unless Spec Limits have been specified for at least one of the Predictor Formula columns.

Number of Selected Individuals

Use this option to specify an upper limit of progeny that meet the trait selection criteria in each generation to use as parents in the subsequent cross. This limit is applied repeatedly for each subsequent generation.

Number of Selected Crosses

Use this option to specify an upper limit on the number of crosses that the trait selection criteria. Progeny from the previous cross are assessed for the selection criteria and this limit is then applied, if needed, to the subsequent cross. This limit is applied repeatedly for each subsequent generation.

Threshold to Make Line Plots

Use this option to set an upper limit to the number of crosses that are used for generating the line plots. Generating line plots that represent multi-generational crosses requires substantial computer resources. Trying to generate too many line plots can overwhelm your computer’s resources. Should the number of crosses made exceed the specified value, JMP does not attempt to generate these plots.

Set Random Seed

Use this option to specify a nonnegative integer to start the random number stream. Different values produce different outcomes of the algorithm.

Imputation Value

Use this option to specify a value to insert into any cell containing a missing value symbol.

Unthreaded

Use this option to suppress multi-threading. Deselect this option for improved computational speed.

Required Data Format for the Marker Simulation Platform

Most of the processes in JMP assume that the input table has a particular data structure. JMP distinguishes between tall and wide data sets. A tall data table has samples as columns and molecular entity (for example, marker, gene, clone, protein, or metabolite) as rows, whereas a wide data table is the transpose of the tall data table, having the samples as rows and molecular entity as columns.

When specifying the input data set for a process, it is important to know the required form. Marker Simulation requires a wide data table. The Transpose platform under the Tables menu enables you to transform your data tables between tall and wide forms.

Marker data must be encoded in the one-column, numeric format. Typically, in this format, diploid individuals homozygous for the least common, or minor allele, are represented in the table by a 2, whereas the heterozygotes are represented by a 1. Homozygotes for the most common allele are represented by a 0.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).