JSL Syntax Reference > JSL Functions > Statistical Functions
Publication date: 11/10/2021

# Statistical Functions

Arc Finder(X(col), Y(col), Group(lot, wafer))

Description

Finds arcs in the point data and creates a new column that identifies the arcs.

Example

`dt = Open( "\$SAMPLE_DATA/Wafer Stacked.jmp" );`
`Arc Finder(`
`	Group( :Lot, :Wafer ),`
`	X( :X_Die ),`
`	Y( :Y_Die ),`
`	Min Distance( 12 ), // minimum distance among 3 points to seed an arc`
`	Min Radius( 15 ), // minimum radius of the acceptable arc`
`	Max Radius( 2000 ), // maximum radius of acceptable arc`
`	Max Radius Error( 2 ), // how close a point needs to be added`
`	Min Arc Points( 5 ), // how many points to define an arc`
`	Number of Searches( 500 ), // how many random probes of data`
`	Max Number Arcs( 3 ) // number of arcs searched for`
`);`
`dt << Color or Mark by Column( :Arc Number );`
`dt << Graph Builder(`
`	Size( 1539, 921 ),`
`	Variables( X( :X_Die ), Y( :Y_Die ), Wrap( :Lot_Wafer Label ), Color( :Arc Number ) ),`
`	Elements( Points( X, Y, Legend( 6 ) ) )`
`);`
` `

Notes

The function is scaled for data that have a range of 30 to 50 units.

The function is suitable only for data that are subset to the interesting defect points.

It is not suitable when the density of points is high.

ARIMA Forecast(column, length, model, estimates, from, to)

Description

Determines the forecasted values for the specified rows of the specified column using the specified model and estimates.

Returns

A vector of forecasted values for column within the range defined by from and to.

Arguments

column

A data table column.

length

Number of rows within the column to use.

model

Messages for Time Series model options.

estimates

A list of named values that matches the messages sent to ARIMA Forecast(). If you perform an ARIMA Forecast and save the script, the estimates are part of the script.

from, to

Define the range of values. Typically, from is between 1 and to, inclusive. If from is less than or equal to 0, and if from is less than or equal to to, the results include filtered predictions.

Best Partition(xindices, yindices, <<Ordered, <<Continuous Y, <<Continuous X)

Description

Experimental function to determine the optimal grouping.

Returns

A list.

Arguments

xindices, yindices

Same-dimension matrices.

Col Cumulative Sum(name, <By var, ...>)

Cumulative Sum(name)

Description

Returns the cumulative sum for the current row. Col Cumulative Sum supports By columns, which do not need to be sorted.

Arguments

name

A column name.

By var

(Optional) A By variable to compute statistics across groups of rows. Use the By variable in a column formula or in a For Each Row() function.

Col Maximum(name, <By var, ...>)

Col Max(name)

Description

Calculates the maximum value across all rows of the specified column. The result is internally cached to speed up multiple evaluations.

Returns

The maximum value that appears in the column.

Arguments

name

A column name.

By var

(Optional) A By variable to compute statistics across groups of rows. Use the By variable in a column formula or in a For Each Row() function.

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Col Mean(name, <By var, ...>)

Description

Calculates the mean across all rows of the specified column. The result is internally cached to speed up multiple evaluations.

Returns

The mean of the column.

Argument

name

A column name.

By var

(Optional) A By variable to compute statistics across groups of rows. Use the By variable in a column formula or in a For Each Row() function.

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Col Median(name, <By var, ...>)

Description

Calculates the median across all rows of the specified column. The ordering is cached internally to speed up multiple evaluations.

Returns

The median of the column.

Argument

name

A column name.

By var

(Optional) A By variable to compute statistics across groups of rows. Use the By variable in a column formula or in a For Each Row() function.

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Col Minimum(name, <By var, ...>)

Col Min(name)

Description

Calculates the minimum value across all rows of the specified column. The result is internally cached to speed up multiple evaluations.

Returns

The minimum value that appears in the column.

Argument

name

A column name.

By var

(Optional) A By variable to compute statistics across groups of rows. Use the By variable in a column formula or in a For Each Row() function.

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Col Moving Average(name, options, <By var, ...>)

Moving Average(name, options)

Description

Returns the moving average over a given interval based at the current row. Col Moving Average supports By columns.

Arguments

name

A column name.

Weighting(1|0|n)

Required positional argument. Determines how the values are weighted. 1 indicates uniform weighting. 0 indicates incremental weighting (a ramp or triangle). Any other number is the parameter for an exponential moving average (EWMA or EMA).

Before(1|0|n)

Positional argument. Controls the size of the range (or window) by including the specified number of items before the current item in the average (in addition to the current item). The default value, -1, means all of the preceding items.

After(1|0|n)

Positional argument. Controls the size of the range (or window) by including the specified number of items after the current item in the average (in addition to the current item). The default value, 0, means no following items.

Partial Window is Missing

Boolean positional argument. Controls how missing values are treated. By default, missing values are ignored. 0 computes the average of partial windows.

By var

(Optional) A By variable to compute statistics across groups of rows. Use the By variable in a column formula or in a For Each Row() function.

Examples

`// equal weighting of a five-item lagging range`
`Col Moving Average( x, 1, 4 );`
` `
`// ramp weighting of all preceding items`
`Col Moving Average( x, 0 );`
` `
`// triangle weighting of a five-item centered range`
`Col Moving Average( x, 0, 2, 2 );`
` `
`// exponential weighting of all preceding items`
`Col Moving Average( x, 0.25 );`

Col N Missing(name, <By var, ...>)

Description

Calculates the number of missing values across all rows of the specified column. The result is internally cached to speed up multiple evaluations.

Returns

The number of missing values in the column.

Argument

name

A column name.

By var

(Optional) A By variable to compute statistics across groups of rows. Use the By variable in a column formula or in a For Each Row() function.

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Col Number(name, <By var, ...>)

Description

Calculates the number of nonmissing values across all rows of the specified column. The result is internally cached to speed up multiple evaluations.

Returns

The number of nonmissing values in the column.

Argument

name

A column name.

By var

(Optional) A By variable to compute statistics across groups of rows. Use the By variable in a column formula or in a For Each Row() function.

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Col Quantile(name, p, <ByVar>)

Description

Calculates the specified quantile p across all rows of the specified column. The result is internally cached to speed up multiple evaluations.

Returns

The value of the quantile.

Argument

name

A column name.

p

A specified quantile p between 0 and 1.

ByVar

(Optional) A By group.

Example

`dt = Open( "\$SAMPLE_DATA/Big Class.jmp" );`
`Col Quantile( :height, .5 );`

63

63 is the 50th percentile, or the median, of all rows in the height column.

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Col Rank(column, <ByVar, ...>, <<tie("average"|"arbitrary"|"row"|"minimum"))

Description

Ranks each row’s value, from 1 for the lowest value to the number of columns for the highest value. Ties are broken arbitrarily.

Arguments

column

The column to be ranked.

ByVar

(Optional) A By variable to compute statistics across groups of rows.

<<tie

Determines how the tie is broken. A tie occurs when the values being ranked are the same. For the data [33 55 77 55], 33 has rank 1 and 77 has rank 4, and the question is how to assign ranking for the 55s. average reports the average of the possible rankings, 2.5, for both 55s. arbitrary matches JMP 12 behavior by assigning the possible rankings in an unspecified order, which could be 2 and 3 or 3 and 2. row assigns the ranks in the order that they originally appear. (The first 55 would be 2 and the second 55 would be 3.)

minimum gives both values the lowest possible rank, 2.

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Col Simple Exponential Smoothing(column, alpha, <ByVar> )

Description

Returns the simple exponential smoothing prediction for the current row using smoothing weight alpha.

Arguments

column

The column of time series observations.

alpha

The smoothing weight.

ByVar

(Optional) A By variable to compute predictions across groups of rows. By variables do not need to be presorted.

Notes

The predicted value for row t is given by the following:

`Predicted[t] = alpha * Observed[t-1] + (1-alpha) * Predicted[t-1]`

By definition, Predicted = Observed.

Col Standardize(name,<By var, ...>)

Description

Calculates the column mean divided by the standard deviation across all rows of the specified column.

Returns

The standardized mean.

Argument

name

A column name.

By var

(Optional) A By variable to compute statistics across groups of rows. If a By variable is specified, the values are standardized against the mean and standard deviation of their corresponding By variable group.

Notes

Standardizing centers the variable by its sample standard deviation. Thus, the following commands are equivalent:

`dt = Open( "\$SAMPLE_DATA/Big Class.jmp" );`
`dt << New Column( "stdht", Formula( Col Standardize( height ) ) );`
`dt << New Column( "stdht2",`
`	Formula( (height - Col Mean( height )) / Col Std Dev( height ) )`
`);`

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Col Std Dev(name,<By var, ...>)

Description

Calculates the standard deviation across rows in a column. The result is internally cached to speed up multiple evaluations.

Returns

The standard deviation.

Argument

name

A column name.

By var

(Optional) A By variable to compute statistics across groups of rows. Use the By variable in a column formula or in a For Each Row() function.

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Col Sum(name,<By var, ...>)

Description

Calculates the sum across rows in a column. Calculating all missing values (Col Sum(.,.)) returns missing. The result is internally cached to speed up multiple evaluations.

Returns

The sum.

Argument

name

A column name.

By var

(Optional) A By variable to compute statistics across groups of rows. Use the By variable in a column formula or in a For Each Row() function.

Notes

If a data value is assigned by a column property (such as Missing Value Codes), use Col Stored Value() to base the calculation on the value stored in the column instead.

Col Stored Value(<dt>, col, <row>)

Fit Censored(Distribution("name"), YLow(vector) | Y(Vector), <YHigh(vector)>, <Weight(vector)>, <X(matrix)>, <Z(matrix)>, <HoldParm(vector)>, <Use random sample to compute initial values(percent)>, <Use first N observations to compute initial values(nobs)>)

Description

Fits a distribution using censored data.

Returns

A list that contains parameter estimates, the covariance matrix, the log-likelihood, the AICc, the BIC, and a convergence message. See Likelihood, AICc, and BIC in Fitting Linear Models.

Arguments

Distribution("name")

The quoted name of the distribution to fit.

YLow(vector) | Y(Vector)

If you do not have censoring, then use Y and an array of your data, and do not specify YHigh. If you do have censoring, then specify YLow and YHigh as the lower and upper censoring values, respectively.

Optional Arguments

YHigh(vector)

A vector that contains the upper censoring values. Specify this only if you have censoring and also specify YLow.

Weight(vector)

A vector that contains the weight values.

X(matrix)

The regression design matrix for location.

Z(matrix)

The regression design matrix for scale.

HoldParm(vector)

An array of specified parameters. The parameters should be nonmissing where they are to be held fixed, and missing where the are to be estimated. This is primarily used to test hypotheses that certain parameters are zero or some other specific value.

Use random sample to compute initial values(percent)

A percent of the observations to be used in the computation of the initial values. Specify this if the data vector is large.

Use first N observations to compute initial values(nobs)

A number of observations at the start of the data vector to be used in the computation of the initial values. Specify this if the data vector is large.

Fit Circle(Xvec, Yvec)

Description

Fits a circle that best goes through three or more points using a least squares approach. If only three points are specified, a direct solution can be found, and the sum of squared errors is zero.

Returns

A list that contains the X and Y coordinates of the center point of the circle, the length of the radius, and the sum of squared errors.

Arguments

Xvec

Vector of X coordinates of three or more points.

Yvec

Vector of Y coordinates of three or more points.

Syntax

`{Xcenter, yCenter, radius, SSE} = Fit Circle(Xvec, Yvec)`

Hier Clust(x)

Description

Returns the clustering history for a hierarchical clustering using Ward’s method (without standardizing data).

Argument

x

A data matrix.

IRT Ability(Q1, <Q2, Q3, ... Qn,> parmMatrix)

Description

Returns scores for the latent variable in an item response theory model with n binary items and a matrix of known parameters. The parameter matrix should contain as many rows as there are parameters in the model and as many columns as there are items in the analysis.

Arguments

Q1, Q2, ..., Qn

A set of n binary items.

parmMatrix

A matrix of parameters from an item response theory model.

Item Analysis Platform Options in Multivariate Methods

KDE(vector, <named arguments>)

Description

Returns a kernel density estimator with automatic bandwidth selection.

Argument

vector

A vector.

Optional Named Arguments

<<weights

Must be a vector of the same length as vector, and can contain any nonnegative real numbers. Weights represents frequencies, counts, or similar concepts.

<<bandwidth(n)

A nonnegative real number. Enter a value of 0 to use the bandwidth selection argument.

<<bandwidth scale(n)

A positive real number.

<<bandwidth selection(n)

n must be 0, 1, 2, or 3, corresponding to Sheather and Jones, Normal Reference, Silverman rule of thumb, or Oversmoother, respectively.

<<kernel(n)

n must be 0, 1, 2, 3, or 4, corresponding to Gaussian, Epanechnikov, Biweight, Triangular, or Rectangular, respectively.

LenthPSE(x)

Description

Returns Lenth’s pseudo-standard error of the values within a vector.

Argument

x

A vector.

Max()

Maximum(var1, var2, ...)

Max(var1, var2, ...)

Description

Returns the maximum value of the arguments or of the values within a single matrix or list argument. If multiple arguments are specified, they must be all numeric values or all quoted strings.

Mean(var1, var2, ...)

Description

Returns the arithmetic mean of the arguments or of the values within a single matrix or list argument.

Median(var1, var2, ...)

Description

Returns the median of the arguments or of the values within a single matrix or list argument.

Min()

Minimum(var1, var2, ...)

Min(var1, var2, ...)

Description

Returns the minimum value of the arguments or of the values within a single matrix argument. If multiple arguments are specified, they must be either all numeric values or all quoted strings.

N Missing(expression)

Description

Rowwise number of missing values in variables specified.

Number(var1, var2, ...)

Description

Rowwise number of nonmissing values in variables specified.

Product(i=initialValue, limitValue, bodyExpr)

Description

Multiplies the results of bodyExpr over all i until the limitValue and returns a single product.

Quantile(p, arguments)

Description

Returns the pth quantile of the arguments. The first argument can be a scalar or a matrix of values between 0 and 1. The remaining arguments can also be specified as values within a single matrix or list argument.

Range(var1, var2, ...)

Description

Returns the minimum and maximum values of the arguments. The result is returned as a two-element row vector that contains the minimum and the maximum.

Robust PCA(X, <Lambda(2/sqrt(max(nrow, ncol)))>, <tolerance=1e-10>, <maxit(75)>, <Center(1)>, <Scale(1)>)

Description

Performs a sequence of singular value decompositions and thresholding steps to decompose the data matrix into a low-rank matrix and a sparse matrix of residuals.

Returns

A

The low-rank matrix estimation.

E

The sparse matrix of residuals.

S

A vector of singular values.

Arguments

X

A data matrix.

Lambda

Specifies a value greater than 0 that determines the sparsity of the matrix of residuals. For larger values of Lambda, the matrix of residuals is more sparse.

tolerance

The convergence criterion.

maxit

The maximum number of SVD iterations.

Center

Centers the data prior to performing the SVD iterations.

Scale

Scales the data prior to performing the SVD iterations

Std Dev(var1, var2, ...)

Description

Rowwise standard deviation of the variables specified.

Sum(var1, var2, ...)

Description

Rowwise sum of the variables specified. Calculating all missing values (Sum(.,.))returns missing.

SSQ(x1, ...)

Description

Returns the sum of squares of all elements. Takes numbers, matrices, or lists as arguments and returns a scalar number. Skips missing values.

Summarize(<dt>, <by>, <count>, <sum>, <mean>, <min>, <max>, <stddev>, <corr>, <quantile>, <first>)

Description

Gathers summary statistics for a data table and stores them in global variables.

Returns

None.

Arguments

dt

Optional positional argument: a reference to a data table. If this argument is not in the form of an assignment, then it is considered a data table expression.

All other arguments are optional and can be included in any order. Typically, each argument is assigned to a variable so you can display or manipulate the values further.

name=By(col | list | Eval)

Using a BY variable changes the output from single values for each statistic to a list of values for each group in the BY variable.

Summarize YByX(X(<x columns>, Y (<y columns>), Group(<grouping columns>), Freq(<freq column>), Weight(<weight column>))

Description

Calculates all Fit Y by X combinations on large-scale data sets.

Returns

A data table of p-values and LogWorth values for each Y and X combination.

Arguments

X(col)

The factor columns used in the fit model.

Y(col)

The response columns used in the fit model.

Group(gcol)

The group of columns used in the fit model.

Freq(col)

The frequency (for each row) column used in the fit model.

Weight(col)

The importance (or influence) column used in the fit model.

Notes

Performs the same function as the Response Screening platform.

The PValues Data Table in Predictive and Specialized Modeling

Response Screening in Predictive and Specialized Modeling

Summation(init, limitvalue, body)

Description

Summation sums the results of the body statement(s) over all i to return a single value.

Tolerance Limit(1-alpha, p, n)

Description

Constructs a 1-alpha confidence interval to contain proportion p of the means with sample size n.