Publication date: 04/12/2021

Compare Data Tables

JMP can compare two open data tables and report the differences between data, scripts, table variables, column names, column properties, and column attributes. The numbers of columns and rows in each data table are shown at the top of the Compare Data Tables window. In this example, the sample data tables Popcorn and are compared.

Figure 4.22 Basic Information about Data Tables 

Basic Information about Data Tables

Specify Matching Columns

Columns with the same name are automatically matched. Lines are drawn between each matched column. You can also manually link two columns.

1. Select Help > Sample Data Library and open and Popcorn

2. Show Popcorn and select Tables > Compare Data Tables.

3. From the With list, select

Popcorn should automatically be selected from the Compare list.

4. In the Match Columns pane, select yield and yield1 and then click Link.

Figure 4.23 Manually Linked Columns 

Manually Linked Columns

Columns that have the same name are automatically linked.

Note: If you click the Compare Data icon Image shown here, the linked columns will not be compared. The values for that pair of columns will show up in the results, but they won’t be compared.

5. Click Compare.

The results are shown in the Data report.

Figure 4.24 Comparing Columns 

Comparing Columns

The first eight rows are the same so they are not shown. The remaining rows (colored blue) are only in the second table,

6. In the Data report, deselect Hide rows with no differences.

Figure 4.25 Showing Rows with No Differences 

Showing Rows with No Differences

The first eight rows are shown because they are the same in both data tables.

7. Deselect Hide columns with no differences.

Figure 4.26 Showing Columns with No Differences 

Showing Columns with No Differences

Columns that are in both data tables and that match are shown. You might select this option to give more context to the matched data.

Note: If the data in a cell isn’t ‘completely shown, select the text and then view the data in the Cell Data box.

Data Table Compare Options

Align Data Options

Flexible by Row

Searches for common rows to align. Consider this option for smaller data tables if you think that the data tables are nearly the same.

By Row

Compares rows one by one. Consider this option if you already know that the rows should line up row by row. The comparison will run much faster. However, if the default Flexible by Row was selected, and the comparison is taking too long, you might want to select this option.

Use ID Columns

Uses the selected ID column to compare rows. The rows of the data table are uniquely identified by the values of the ID column. Consider this option if the data tables are large, sorted differently, or have missing rows. You can select more than one column.

Fuzzy Compare Options

Ignore missing

Ignores missing data.

Allow Relative Error

Enables you to specify the relative error rate for numeric data. The numeric values are considered equal if they are within the relative error rate that you specify. The smaller the relative error rate, the more precise the comparison.

Ignore case

Disregards case when comparing text.

Ignore whitespace

Disregards whitespace when comparing text.

Data Options

Show fuzzy differences

Shows differences in numeric and string data that are approximately the same. Works with the value in the Relative Error field to remove insignificant differences and with string data that are fuzzily the same.

Hide columns with no differences

Shows or hides all identical linked column pairs in the comparison table.

Hide rows with no differences

Shows or hides rows that contain matching data.

Compare Table Properties

Click the red triangle and select Compare Table Properties to see differences in table scripts and variables. Figure 4.27 shows that the table variables and scripts are different. To see the full variable or script, select the line and view the selected metadata.

In this example, both notes differ, and the reference variable and scripts are only in The Notes variable is selected so that you can see it contents in both data tables in the Selected Metadata box. The red shading indicates that the text is only in The blue shading indicates that the text is only in Popcorn

Figure 4.27 Different Table Variables 

Different Table Variables

Deselect Show Diff to see the name of each data table and the complete contents of each Notes variable instead of only showing the differences.

Shortest run is the smallest number of consecutive characters that are required to be the same (between the two files) before you can declare the characters as a common subsegment. A common subsegment has no background color because it is present in both files. The shortest run is set to 3 to prevent subsegments that are too short from showing up as common, which is typically unhelpful. For example, a shortest run of 1 means that any single character that is in both files could match. This leads to many very short segments of common text and differences, which is usually not a good reading experience.

Compare Column Attributes and Properties

Click the red triangle and select Compare Column Attributes and Properties to see differences in column notes, value colors, and the like.

Figure 4.28 shows that column notes are different in and Popcorn The popcorn column is selected so that you can see the complete note and differences between the two notes in the Selected Metadata box.

Figure 4.28 Comparing Column Attributes and Properties 

Comparing Column Attributes and Properties

Want more information? Have questions? Get answers in the JMP User Community (