Advanced Data Table Scripting

The following example uses the Big Class sample data table:

Named arguments include the following: Count, Sum, Mean, Min, Max, StdDev, First, Corr, and Quantile. Each argument takes a data column argument.

Note the following:

•	If a name=By(groupvar) statement is included, then a list of subgroup statistics is assigned to each name.

•	Count does not require a column argument, but it is often useful to specify a column to count the number of nonmissing values.

•	Quantile also takes a second argument for specifying which quantile, such as 0.1 for the 10th percentile.

Note: Excluded rows are excluded from Summarize calculations. If all data are excluded, Summarize returns lists of missing values. If all data have been deleted (there are no rows), Summarize returns empty lists.

Summarize(

a = by( age ),

c = count,

sumHt = Sum( height ),

meanHt = Mean( height ),

minHt = Min( height ),

maxHt = Max( height ),

sdHt = Std Dev( height ),

q10Ht = Quantile( height, .10 )

Show(a, c, sumHt, meanHt, minHt, maxHt, sdHt, q10Ht);

Because the script included a By group, the results are a list and six matrices:

a = {"12", "13", "14", "15", "16", "17"}

c = [8, 7, 12, 7, 3, 3]

sumHt = [465, 422, 770, 452, 193, 200]

meanHt = [58.125, 60.28571428571428, 64.16666666666667, 64.57142857142857, 64.33333333333333, 66.66666666666667]

minHt = [51, 56, 61, 62, 60, 62]

maxHt = [66, 65, 69, 67, 68, 70]

sdHt = [5.083235752381126, 3.039423504234876, 2.367712103711172, 1.988059594776032, 4.041451884327343, 4.163331998932229]

q10Ht = [51, 56, 61.3, 62, 60, 62]

You can format the results using TableBox. For details, see Build Your Own Displays from Scratch in Display Trees.

New Window( "Summary Results",

Table Box(

String Col Box( "Age", a ),

Number Col Box( "Count", c ),

Number Col Box( "Sum", sumHt ),

Number Col Box( "Mean", meanHt ),

Number Col Box( "Min", minHt ),

Number Col Box( "Max", maxHt ),

Number Col Box( "SD", sdHt ),

Number Col Box( "Q10", q10Ht )

You can add totals to the window, as follows:

Results from Summarize

Summarize(

tc = count,

tsumHt = Sum( height ),

tmeanHt = Mean( height ),

tminHt = Min( height ),

tmaxHt = Max( height ),

tsdHt = Std Dev( height ),

tq10Ht = Quantile( height, .10 )

Insert Into( a, "Total" );

c = c |/ tc;

sumHt = sumHt |/ tsumHt;

meanHt = meanHt |/ tmeanHt;

minHt = minHt |/ tminHt;

maxHt = maxHt |/ tmaxHt;

sdHt = sdHt |/ tsdHt;

q10Ht = q10Ht |/ tq10Ht;

New Window( "Summary Results",

Table Box(

String Col Box( "Age", a ),

Number Col Box( "Count", c ),

Number Col Box( "Sum", sumHt ),

Number Col Box( "Mean", meanHt ),

Number Col Box( "Min", minHt ),

Number Col Box( "Max", maxHt ),

Number Col Box( "SD", sdHt ),

Number Col Box( "Q10", q10Ht )

sdHt = Std Dev( height ),

Summarize with Total

If you do not specify a By group, the result in each name is a single value, as follows:

Summarize(

// a=by(age),

c = count,

sumHt = Sum( height ),

meanHt = Mean( height ),

minHt = Min( height ),

maxHt = Max( height ),

q10Ht = Quantile( height, .10 )

Show( c, sumHt, meanHt, minHt, maxHt, sdHt, q10Ht );

sdHt = 4.24233849397192;

q10Ht = 56.2;

Summarize supports multiple By groups. For example, in Big Class.jmp, proceed as follows:

Summarize(g=by(age, sex), c=count());

show(g, c);

g = {"12", "12", "13", "13", "14", "14", "15", "15", "16", "16", "17", "17"}, {"F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M"}}

c = [5,3,3,4,5,7,2,5,2,1,1,2]

If you specify a By group, the results are always matrices. Otherwise, the results are scalars.

Create a Table of Summary Statistics

The Summary command creates a new table of summary statistics according to the grouping columns that you specify. Do not confuse Summary with Summarize, which collects summary statistics for a data table and stores them in global variables. See Store Summary Statistics in Global Variables for details.

summDt = dt << Summary(

Group(groupingColumns),

Subgroup(subGroupColumn),

Statistic(columns),//where statistic is Mean, Min, Max, Std Dev, and so on.

Output Table Name(newName));

The following example creates a new table with columns for the mean of height and weight by age, and the maximum height and minimum weight by age:

Mean(Height,Weight), Max(Height), Min(Weight),

summDt = dt << Summary(

Group(Age),

output table name("Height-Weight Chart"));

Tip: Output Table Name can take a quoted string or a variable that is a string.

By default, a summary table is linked to the original data table. If you want to produce a summary that is not linked to the original data table, add this option to your Summary message:

summDt = dt << Summary( Group( :age ), Mean( :height ),

Link to original data table( 0 )

Output Table Name("name"),

Subset a Data Table

Subset creates a new data table from rows that you specify. If you specify no rows, Subset uses the selected rows. If no rows are selected or specified, it uses all rows. If no columns are specified, it uses all columns. And if Subset has no arguments, the Subset window appears.

Copy Formula (1 or 0),

Sampling rate (n),

Suppress Formula Evaluation(1 or 0));

Note: For more arguments, see the Scripting Index.

For example, using Big Class.jmp, to select the columns for all rows in which the age is 12:

For Each Row(Selected(Rowstate())=(age==12));

subdt = dt << Subset(output table name("subset"));

To select three columns and all rows:

subDt1 = dt << Subset (Columns(Name,Age,Height), Output Table Name("Big Class NAH"));

To select specified rows of two columns, linking:

subDt2 = dt << Subset (Columns(Name,Weight),Rows([2,4,6,8]),Linked);

To select the columns for all rows in which the age is 12:

dt << Select Where(age==12);

dt << Subset(( Selected Rows ), Output Table Name("subset"));

Subset Data Using the Data Filter

The Data Filter provides a variety of ways to identify subsets of data. Using Data Filter commands and options, you can select complex subsets of data, hide these subsets in plots, or exclude them from analyses.

The basic syntax appears as follows:

dt << Data Filter( <local>, <invisible>, <Add Filter (...)>, <Mode>, <Show Window( 0 | 1 ), <No Outline Box( 0 | 1 )> );

Options for Data Filter include the following:

Add Filter, Animation, Auto Clear, Clear, Close, Columns, Conditional, Data Table Window, Delete, Delete All, Display, Get Filtered Rows, Location, Match, Mode, Report, Save and Restore Current Row States, Set Select, Set Show, Set Include, Show Column Selector, Show Subset, Use Floating Window

Note: For more arguments, see the Scripting Index.

You can send an empty Data Filter message to a data table, and the initial Data Filter window appears, showing the Add Filter Columns panel that lists all the variables in the data table.

Mode takes three arguments, all of which are optional: Select(bool), Show(bool), Include(bool). These arguments turn on or off the corresponding options. The default value for Select is true (1). The default value for Show and Include is false (0).

Add Filter adds rows and builds the WHERE clauses that describe a subset of the data table. The basic syntax appears as follows:

Add Filter( columns( col, ... ), Where( ... ), ... )

To add columns to the data filter, list the columns names separated by commas. Note that this is not a list data structure.

You can define one or more WHERE clauses to specify the filtered columns, as follows:

df = dt << Data Filter(

Mode( Show( 1 ) ),

Add Filter(

columns( :age, :sex, :height ),

Where( :age == {13, 14, 15} ),

Where( :sex == "M" ),

Where( :height >= 50 & :height <= 65 )

df << (Filter Column(:age) << Invert Selection);

This script selects 13, 14, and 15-year-old males with a height greater than or equal to 50 and less than or equal to 65. In this filtered data, you can instead select the excluded ages by adding the Invert Selection message to the preceding script.

In this example, ages other than 13, 14, and 15 in the filtered data are selected.

You can also use Add Filter to select matching strings from columns with the Multiple Response property:

dt = Open( "$SAMPLE_DATA/Consumer Preferences.jmp" );

df = dt << Data Filter(

Location( {437, 194} ),

Add Filter(

columns( :Brush Delimited ),

Match None( Where( :Brush Delimited == {"Before Sleep", "Wake"} ) ),

Display( :Brush Delimited, Size( 121, 70 ), Check Box Display )

You can also send messages to an existing Data Filter object:

This script selects rows with values in the Brush Delimited column that do not match either of the specified values ("Before Sleep", "Wake"). Other available scripting options include Match Any, Match All, Match Exactly, and Match Only. See the Using JMP book for details on the options for the Multiple Response property.

Clear(), Display( ... ), Animate(), Mode(), ...

Clear takes no arguments and clears the data filter.

To prevent the data filter from appearing when the script is run, set Show Window to 0 as follows:

obj = ( Current Data Table() << Data Filter( Show Window(0) ) );

Define the Context of a Data Filter

A data filter lets you interactively select complex subsets of data, hide the subsets in plots, or exclude them from analyses. Your selections affect all analyses of the data table.

Another option is to filter data from specific platforms or display boxes. Create a local data filter inside the Data Filter Context Box() function. This defines the context as the current platform or display box rather than the data table.

The following example creates a local data filter for a Graph Box. See Local Filter with Two Bubble Plots for the output.

New Window( "Marker Seg Example",

Data Filter Context Box(

V List Box(

dt << Data Filter( Local ),

g = Graph Box(

Frame Size( 300, 240 ),

X Scale( Min( xx ) - 5, Max( xx ) + 5 ),

Y Scale( Min( yy ) - 5, Max( yy ) + 5 ),

Marker Seg( xx, yy, Row States( dt, rows ) )

Local Data Filter and Graph

Tip: To experiment with this script, open the Local Data Filter for Custom Graph.jsl sample script.

A context box can also contain several graphs with one local filter. The filtering then applies to both graphs. The following script creates two bubble plots and one data filter in one context box. See Local Filter with Two Bubble Plots for the output.

Data Filter Context Box(

H List Box(

dt << Data Filter( Local ),

Platform(

Current Data Table(),

Bubble Plot( X( :weight ), Y( :height ), Sizes( :age ) )

Platform(

Current Data Table(),

Bubble Plot( X( :weight ), Y( :age ), Sizes( :height ) )

Local Filter with Two Bubble Plots

Tip: To experiment with this script, open the Local Data Filter Shared.jsl sample script.

Data filters can be hierarchical. One script might generate a data filter and a local data filter. The outer data filter for the data table determines which data is available for the local data filter. To illustrate this point, the following script produces both types of data filters as shown in Data Filter Hierarchy:

dt << Data Filter(

Add Filter( Columns( :height ), Where( :height >= 55 & :height <= 65 ) )

New Window( "Marker Seg Example",

// local data filter

Data Filter Context Box(

V List Box(

dt << Data Filter( Local ),

g = Graph Box(

Frame Size( 300, 240 ),

X Scale( Min( xx ) - 5, Max( xx ) + 5 ),

Y Scale( Min( yy ) - 5, Max( yy ) + 5 ),

Marker Seg( xx, yy, Row States( dt, rows ) )

Heights greater than or equal to 55 and less than or equal to 65 are initially filtered. Then the user can work with the local data filter to filter columns on the graph.

Data Filter Hierarchy

Sort a Data Table

Sort rearranges the rows of a table according to the values of one or more columns, either replacing the current table or creating a new table with the results. Specify ascending or descending sort for each By column.

dt << Sort(

"Private", "Invisible", Replace table, By(columns), Order(Descending | Ascending) );

The following example creates a new data table based on Big Class.jmp that sorts the data in descending order by age and by name:

Order(Descending, Ascending),

sortedDt = dt << Sort(

By(Age,Name),

output table name("Sorted age name"));

You can also use an associative array to sort values in a column. The associative array gives you a quick look at unique values in a column. For details, see Sort a Column’s Values in Lexicographic Order in Data Structures.

Stack Values in a Data Table

Stack combines values from several columns into one column.

dt << Stack(

columns (columns), // the columns to stack together

Source Label Column ("name"), // to identify source columns

Stacked Data Column ("name"), // name for the new stacked column

Keep (columns), //the columns to keep in the data table

Drop (columns), //the columns to drop in the data table

Output Table ("name"), // name for the new data table

Columns(columns)); // specify which columns to include in the stacked table

For example, where dt is a reference to Big Class:

columns( :weight, :height ),

dt << Stack(

Source Label Column( "ID" ),

Stacked Data Column( "Y" ),

Name( "Non-stacked columns" )(Keep( :age, :sex )),

Output Table( "Stacked Table" );

The Columns(columns) argument can take a list of columns, or an expression that evaluates to a list.

Split Values in a Stacked Data Table

Split breaks a stacked column into several columns.

dt << Split(

Split(columns), // the column to split (required)

Split by(column), // the column to split by (required)

Group(column), // splits data within groups

<Private>|<Invisible>, // resulting table is private or invisible

Remaining Columns( Keep All | Drop All | Select(columns)), // specify what to do with the remaining columns in the resulting table (Keep All by default)

<Copy formula(0|1)>, // includes column formulas from the source table in the resulting table (default is 1, true)

<Suppress formula evaluation(0|1)>, // stops any copied formulas from being evaluated (default is 1, true)

Sort by Value Order, // sorts the order of resulting columns by a Value Ordering property (must be specified on the Split by column)

Output Table ("name")); //generates the output to the table name specified

The following example reverses the previous example for Stack, returning essentially the original table, except that the height and weight columns now appear in alphabetic order:

SplitDt = StackedDt << Split(

Split(y),

ColID(ID),

Output Table ("Split"));

Transpose a Data Table

Transpose creates a new data table by flipping a data table on its side, interchanging rows for columns and columns for rows. If you specify no rows, Transpose uses the selected rows. If no rows are selected, it uses all rows.

dt << Transpose(

"private", "invisible",

columns( columns ),

Rows( row matrix ),

By ( column ),

Label column name( "name" ),

Output Table( "name" )

Output Table Name("name"));

The following example transposes the height and weight columns in the Big Class.jmp sample data table:

tranDt = dt << Transpose(Columns(Height,Weight),

output table name("Transposed Columns"));

Note: The simple transpose command dt << Transpose brings up the Transpose window. If you do not want the window to appear, invoke the transpose as dt << Transpose(no option).

Vertically Concatenate Data Tables

Concatenate, also known as a vertical join, combines rows of several data tables top to bottom.

dt << Concatenate( DataTableReferences,...,Keep Formulas,

Output Table Name("name"));

For example, if you have subsetted tables for males and females, you can put them back together using Concatenate:

dt << Select Where(sex=="M"); m=dt << Subset(output table name("M"));

dt << Invert Row Selection; f=dt << Subset(output table name("F"));

both=m << Concatenate(f,output table name("Both"));

Or, instead of creating a new table containing all the concatenated data, you can append all the data to the current data table:

dt << Concatenate(DataTableReferences, Append to first table);

Horizontally Concatenate Data Tables

Join, also known as horizontal join or concatenate, combines data tables side to side.

dt << Join( // message to first table

With(dataTable), // the other data table

Select(columns), // optional column selection, or use ALL

Select With(columns), // optional column selection, or use ALL

By Row Number, // join type; alternatives are Cartesian Join or

// By Matching Columns(col1==col2,col),

// Update first table with data from second table,

// Merge same name columns,

Copy formula(0), // on by default; 0 turns it off

Suppress formula evaluation(0), // on by default; 0 turns it off

// options for each table:

Drop Multiples(Boolean, Boolean),

Include NonMatches(Boolean, Boolean),

Preserve Main Table Order(); // maintains the order of the original data

Output Table Name("name")); // the resulting table

To try this, first break Big Class.jmp into two parts:

part1=dt << Subset(Columns(Name, Age, Height), Output Table Name("NAH_Big Class"));

part2=dt << Subset(Columns(Name, Sex, Weight), Output Table Name("NSW_Big Class"));

To make it a realistic experiment, rearrange the rows in part 2:

sortedPart2=part2 << sort(by(name), Output Table Name("SortedNSW_Big Class"));

Now you have a data set in two separate chunks, and the rows are not in the same order, but you can join them together by matching on the column that the two chunks have in common.

joinDt = part1 << Join(

With(sortedPart2),

By Matching Columns(name==name),

Preserve Main Table Order();

Output table name("Joined Parts"));

The resulting table has two copies of the name variable, one from each part, and you can inspect these to see how Join worked. Notice that you now have four Robert rows, because each part had two Robert rows (there were two Roberts in the original table) and Join formed all possible combinations.

Tip: To maintain the order of the original data table in the joined table (instead of sorting by the matching columns), include Preserve Main Table Order(). This function speeds up the joining process.

Replace Data in Data Tables

Note: Merge Update is an alias for Update.

Update replaces data in one table with data from a second table.

dt << Update( // message to first table

With(dataTable), // the other data table

By Row Number, // default join type; alternative is

// By Matching Columns(col1==col2)

Ignore Missing, // optional, does not replace values with missing values

NewHt=dt << Subset(Columns(Name, Height), Output Table Name("hts"));

To try this, make a subset of Big Class.jmp, as follows:

Next, add 0-6 inches to each student’s height:

diff = random uniform(6,0);

For Each Row(height+=diff);

Finally, update the heights of students in Big Class.jmp with the new heights from the subset table:

dt << Update(

With(NewHt),

By Matching Columns(name==name),

Controlling the Columns Added to an Updated Table

Your updated table might contain more columns than your original table. You can select which columns are included in your updated table using the option Add Column from Update Table().

To add no additional columns:

Data Table( "table" ) << Update(

With( Data Table( "update data" ) ),

Match Columns( :ID = :ID ),

Add Columns from Update Table( None )

Data Table( "table" ) << Update(

To add some columns:

With( Data Table( "update data" ) ),

Match Columns( :ID = :ID ),

Add Columns from Update table( :col1, :col2, :col3 )

Create a Table Using Tabulate

Tabulate constructs tables of descriptive statistics. The tables are built from grouping columns, analysis columns, and statistics keywords. The following example creates a table containing the standard deviation and mean for the height and weight of male and female students:

dt << Tabulate( // message to data table

Add Table( // start a new table

Column Table( Grouping Columns( :sex ) ),

// group using the column sex

Row Table( // add rows to the table

Analysis Columns( :height, :weight ),

// use the height and weight columns for the analysis

Statistics( Std Dev, Mean ) // show the standard deviation and mean

)));

You can apply a transformation to a column in the Tabulate table and set the format of the column at the same time. Use the Transform Column() function inside Analysis Column(). For example, the following script applies the Log transformation to height and sets the column format for the Mean and % of Total columns.

Names Default To Here( 1 );

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

Formula( Log( :height ) )

Format( 10, "Best" )

Name( "% of Total" )(:height( 12, 2 ),

Analysis Column(

Transform Column(

"Log[height]",

Formula( Log( :height ) )

Formula( Log( :height ) )

Statistics( Mean, Name( "% of Total" ) )

Row Table( Grouping Columns( :sex ) )

Find Missing Data Patterns

Note: You can also change the width of columns and perform additional operations within Tabulate. For details, see the JMP Scripting Index.

If your data table contains missing data, you might want to see whether there is a pattern that the missing data creates.

dt << Missing Data Pattern( // message to data table

columns( :miss ), // find missing data in this column

Output Table( "Missing Data Pattern" ) // name the output table

JMP can compare two open data tables and report the differences between data, scripts, table variables, column names, column properties, and column attributes. To compare data tables using JSL:

Compare Data Tables

obj = dt << Compare Data Tables(

Compare With( Data Table ("Data Table Name"));

For example, to compare the data tables Students1.jmp and Students2.jmp, proceed as follows:

dt = Open( "$SAMPLE_DATA/Students1.jmp" );

dt2 = Open( "$SAMPLE_DATA/Students2.jmp" );

obj = dt << Compare Data Tables( Compare With( Data Table( "Students2" ) ) );

To view the difference summary matrix for the comparison results, use the following function:

mtx = (obj << Get Difference Summary matrix);

The resulting matrix appears in the log window, for example:

In the comparison summary matrix in the example, the first column represents the action performed on a column: -1 for a Delete, 0 for a Replace, and 1 for an Add. The second column represents the number of rows affected by the action. The third and fourth columns represent the row numbers affected in the two data tables in their respective order (Students1.jmp, Students2.jmp).

Create a Summary Table

A Summary table includes summary statistics for a data table, such as the mean, median, standard deviation, minimum value, and maximum value. By default, the summary table is linked to its source table. When you select rows in the summary table, the corresponding rows are highlighted in its source table.

dt << Summary( <private>, <invisible>, <Group( column )>, <Weight( column )>, <Freq( column )>, <N>, <Mean( column )>, <Std Dev( column )>, <Min( column )>, <Max( column )>...

Include columns for any statistic listed in the Summary launch window.

The following example creates a summary table from the Fitness.jmp sample data table. The N Rows column, which specifies the number of rows for each level, is included in the Summary table by default. This script creates a column of the mean for Runtime by Sex.

dt = Open("$SAMPLE_DATA/Fitness.jmp");

You can override the preselected Freq or Weight columns in the data table by specifying the Freq or Weight columns in the script. To omit the preselected Freq or Weight, use the keyword "none".

dt = Open("$SAMPLE_DATA/Fitness.jmp");

To unlink the Summary table, include Link to Original Data Table( 0 ) in the script.

dt = Open("$SAMPLE_DATA/Fitness.jmp");

dt << Summary(

Group( :Sex ),

Mean( :Runtime ),

Link to Original Data Table( 0 )

Subscribe to a Data Table

If you want to receive a message when a data table changes, use the Subscribe message. For example, you might want a message sent to the log when columns are added or deleted.

The basic syntax is as follows:

The first argument is a name for the subscription (or “client”), so that it can also be removed.

The application can also subscribe as a client to the data table (for example, most built-in platforms such as Distribution). If a data table has clients when the data table is closed, the user is warned that there are applications open that might need the data table.

Each subscription remains in effect until you unsubscribe. Unsubscribe as follows:

The following example sends messages to the log when a row is added or deleted:

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

delRowsFn = Function( {a, b, rows},

dtname = (a << Get Name());

Print( dtname );

Print( b );

Print Matrix( rows );

addRowsFn = Function( {a, b, insert},

dtname = (a << Get Name());

dt << subscribe("Test Delete",onDeleteRows( delRowsFn, 3 ));

dt << subscribe("Test Add",onAddRows( addRowsFn, 3 ));

Empty Application Names

If Subscribe is called with an empty application name, JMP generates a unique name that is returned to the caller. In the following example, appname2 is subscribed to the data table as a client. JMP asks the user to confirm when trying to close the data table.

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

appname1 = dt << Subscribe( "", On Close( Print( "Closing Data Table" ) ) );

appname2 = dt << Subscribe(

""("client"),

On Close(

Function( {dtab},

dtname = (dtab << Get Name());

Print( dtname );