Overview of Key JMP Platforms for Analysis of Wide Data

JMP offers a variety of platforms that are particularly useful for exploring and analyzing your wide data.

File > Open can handle a variety of formats for individual files and Import Multiple Files handles collections of files

Cols > Group Columns is handy for grouping a large number of column for subsequent referencing, and Cols > Standardize Attributes changes attributes for one or more columns

The Distribution platform enables you to analyze both categorical and continuous variables. For categorical variables, the initial graph that appears is a histogram that shows a bar for each level of the ordinal or nominal variable. For numeric continuous variables, the initial graphs show a histogram and an outlier box plot. The histogram shows a bar for grouped values of the continuous variable.

The Multivariate platform enables you to examine multiple variables to see how they relate to each other.

Graph Builder enables you to quickly create and experiment with plots to interactively explore your data.

Explore Missing Values provides several ways to identify and understand the missing values in your data as well as conducting multivariate imputation for missing values.

Hierarchical Clustering is a multivariate technique that successively groups together observations that share similar values across a number of variables.

Principal Componentsenables you to derive a small number of independent linear combinations (principal components) of a set of measured variables that capture as much of the variability in the original variables as possible. Principal component analysis is a dimension-reduction technique, as well as an exploratory data analysis tool.

Multivariate Embedding enables you to map data from very high dimensional spaces to a low dimensional spaces which can be easily visualized in such a way that clusters of near neighbors can be more easily identified.

Marker Statistics provides a convenient method for exploring several properties of all the biallelic markers in a data set, for the purpose of quality control (QC) and possibly selecting markers to be removed from the analysis

Marker Simulation simulates the progeny from a specified set of crosses using biallelic markers and predictor formulas saved in your data table. This process enables you to test various crosses to estimate which crosses will generate progeny with the desired combinations of traits.

Response Screening automates the process of conducting tests across a large number of responses.

The Fit Model platform provides an efficient way to specify models that have complex effect structures. These effect structures are linear in the model parameters. Once you have specified your model, you can select the appropriate fitting technique from a number of fitting personalities.

  • The Response Screening personality uses univariate tests of all responses against linear model effects.
  • Use Logistic Regression to model the probabilities of the levels of a categorical Y response variable as a function of one or more X effects. The Fit Model platform provides two personalities for fitting logistic regression models. The personality that you use depends on the modeling type (Nominal or Ordinal) of your response column.
  • Use the Mixed Model personality to specify fixed effects, random effects, a repeated structure or a combination of those. This personality allows for unbounded variance components. This means that variance components that have negative estimates are not reported as zero.
  • Finally, the Fit Model platform has additional options to suppress reports and funnel results into data tables. These include Options for Many Responses: Suppress Reports , Results in Data Table, Dispose Reports.