DRAFT help

Life Sciences > Normalization > Launch the Normalization Platform
Publication date: 12/16/2025

Launch the Normalization Platform

Launch the Normalization platform by selecting Analyze > Life Sciences > Normalization.

Figure 8.2 The Normalization Launch Window 

The Normalization Launch Window

For information about the options in the Select Columns red triangle, see “Column Filter Menu”in Using JMP.

Y, Columns

Select the sample columns that you want and click Y, Columns to specify the samples that you want to analyze.

X, Control Columns

Use this option to specify known control or reference columns that contain one or more reference or control samples expected to represent a normal population. This helps anchor normalization when using methods like TMM.

Length

Use this option to specify the column containing the length of each transcript.

By

Produces a separate report for each level of the By variable. If more than one By variable is assigned, a separate report is produced for each possible combination of the levels of the By variables.

Method

Select the algorithm to use for performing the normalization.

Select Row Standardize to scale each row to have a mean of 0 and a standard deviation of 1. This method requires continuous data; it is not ideal for compositional data or raw counts. This method is useful for highlighting relative variation within samples, especially in conjunction with clustering and principal components analysis.

Select Relative Abundance Percentage (RAP) to convert raw counts into percentage abundances across each sample. This method is used for counts data. It is used to emphasize relative proportions; it should not be used for statistical models that are sensitive to compositional bias. It is particularly useful for visualizations that emphasize relative amounts, such as pie charts or bar graphs and for exploratory analysis.

Select Relative Abundance Ratio (RAR) to express each feature's abundance as a ratio relative to a reference (e.g., a control or geometric mean). This method is used for counts data. It is used to highlight fold changes relative to a baseline. It is not ideal for either skewed or sparse data.

Select Centered Log Ratio (CLR) to log-transform compositional data centered on the geometric mean. This method is used for removing the closure effect in compositional data. It is particularly suited for microbiome and metagenomics datasets. It is best not to use this method with data that contains ”zero” counts.

Select Count/Read Per Million (CPM/RPM) to scale counts by total reads in each sample and then multiply by one million. This method is used to adjust sequencing data to account for differences in sequencing depth when investigating expression comparisons across samples. It is best not to use this method with data that contains ”zero” counts.

Select Read/Fragment Per Kilobase of Transcript Per Million Mapped Reads/Fragments (RPKM/FPKM) to normalize sequencing data for both gene length and sequencing depth. This method is used for count data. It is particularly useful with transcriptomic expression profiling and between-sample comparisons. Because transcript length is considered, it is best to use this method for between sample comparisons.

Select Transcript Per Million (TPM) to normalize sequencing data for both gene length and sequencing depth. This method is similar to RPKM, but more suitable for between-sample comparisons. This method is used for count data. Reduces compositional bias in RNA-seq. It is best for: differential expression (e.g., edgeR) data, but should not be used for very sparse or low-depth datasets.

Select Trimmed Mean of M-Values (TMM) to perform a normal or log-ratio normalization using a reference sample, trimmed to reduce bias. This method uses one or more reference or control samples if provided, or automatically selects a reference sample if none is specified. This method reduces compositional bias in RNA-seq and is best for differential expression analysis (for example, edgeR) data. It is not ideal for very sparse or heterogeneous datasets.

Select Kernel Density Mean of M-Component (KDMM) to perform a normal or log-ratio density-aware normalization using smoothed M-component distributions. This method takes weighted means of log-ratios across kernel-estimated density regions and adjusts for multimodal and nonlinear compositional structure. It canbe applied to count or compositional data. It is best for advanced microbiome and metagenomics normalization but should not be used for small sample sizes or low-resolution data.

Missing Value Imputation

Check this option when there are missing values to be imputed.

Replace Y-Data

Check this option to replace the original Y-columns with the corresponding normalized Y-columns.

Create New Tables for Results

Check this option to create new tables for the output data and results. When this option is unchecked output is added to the end of the input data table.

M Threshold

Use this advanced option to set a cutoff for log fold-change (M-values) between samples. This trims out extreme values that might distort normalization. Formula: M = log2(sample / reference). Use of this option helps improve stability, especially in datasets with strong differential expression.

This option is used for TMM and KDMM only.

A Threshold

Use this advanced option to apply a lower limit to average log-intensity (A-values) so that low-abundance or noisy features don't skew the normalization. Formula: A = 0.5 × log2(sample × reference)

This option is used for TMM and KDMM only.

Prior Count

Use this advanced option to add a small constant to all counts before log transformation (for example, log2(count + 0.5)). This prevents issues with zeros in sparse datasets.

This option is used for TMM and KDMM only.

Interpolated Quantiles

Use this advanced option to enable the use of smoothed quantiles during normalization, which can be especially helpful in KDMM for complex or uneven data distributions

This option is used for TMM and KDMM only.

Log Transformed

Use this advanced option to convert the raw counts to log-transformed values.

This option is used for TMM and KDMM only.

Keep dialog open

Check this box to keep this platform dialog open after the analysis is run.

Data Format

Most of the processes in JMP assume that the input table has a particular data structure. JMP distinguishes between tall and wide data tables. A tall data table has samples as columns and molecular features (for example, genes, transcripts, proteins, or metabolites) as rows, whereas a wide data table is the transpose of the tall data table, having the samples as rows and molecular features as columns.

When specifying the input data set for the Normalization platform, it is important to know the required form. The Normalization platform expects a tall data table, where each row represents a molecular feature and each column represents a sample or condition. If your data are in the wide form, you can use the Transpose platform under the Tables menu to convert between tall and wide formats before normalization.

Normalization methods require numeric data values representing measured or counted abundances (for example, read counts, intensities, or relative abundances). Non-numeric or categorical data columns should be excluded from the normalization process.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).