Technical Details for the Fit Definitive Screening Platform

The Effective Model Selection for DSDs Algorithm

This section provides a summary of the algorithm used in the Fit Definitive Screening platform. See Jones and Nachtsheim (2016).

Decomposition of Response

The Effective Model Selection algorithm expresses the response, Y, in terms of two responses YME and Y2nd, so that Y = YME + Y2nd.

– YME is the predicted value obtained from a regression of Y on the main effects and fake factors.

There is no need to include the block factor in YME because of the fold-over structure of the design. The block factor is included in Y2nd.

– Y2nd is given by Y2nd = Y - YME.

Note: In a DSD, the columns YME and Y2nd are orthogonal.

The analysis proceeds in two stages:

• Stage 1: The response YME is used to identify main effects. Stage 1 identifies the main effects that are considered active.

• Stage 2: The response Y2nd is used to identify second-order effects. Stage 2 considers all second-order terms in the active main effects from Stage 1 and determines a subset of these containing effects considered to be active.

Note: If there is a blocking factor, it is included in the Stage 2 list of effects even if it is not significant.

Stage 1 Methodology

The Stage 1 methodology depends on whether the design contains fake factors or centerpoint replicates.

Case 1: Fake Factors or Centerpoint Replicates Available

1. Using the fake factors or center point replicates, an estimator of error variance that is independent of the model is constructed. Assuming that there are no active third or higher odd order effects, this estimate is unbiased.

2. Using YME, main effects are tested against this estimate. Main effects with p-values less than a threshold p-value are considered active. The threshold values are the following:

– For one error degree of freedom, the threshold value is 0.20.

– For two error degrees of freedom, the threshold value is 0.10.

– For more than two error degrees of freedom, the threshold value is 0.05.

– User specified p-value is the threshold.

Note: To specify a different p-value threshold select Set Stage 1 p value from the Fit Definitive Screening red triangle menu.

3. If no main effect has a p-value less than the threshold value, conclude that there are no active main effects and no active two-factor effects. The procedure terminates.

4. If active main effects are found, then variability from the inactive main effects is pooled into the error variance constructed in (1).

Note: If Categorical factors are in the design, the estimated coefficients are recalculated each time a main effect is chosen as active.

Case 2: No Fake Factors or Centerpoint Replicates Available

In this case, there is no model-independent estimator of error variance available. Subsets of main effects are tested sequentially against an estimate of error variance constructed from the inactive main effects. Suppose that there are m main effects.

1. The absolute values of the estimated effects, using YME as the response, are ordered from largest to smallest.

2. For each 1 ≤ i < m, the effect with the ith largest absolute value is tested against the adjusted residual sum of squares for the model containing that effect and all effects with larger absolute values.

3. The effects in the model with the smallest p-value are considered to be the active effects.

4. If active main effects are found, then variability from the inactive main effects is used to construct an estimate of error variance, using YME as the response.

Note: For the Fit Definitive Screening procedure to work properly in Case 2, at least one of the main effects must be active and at least one must be inactive. If no main effects are active, or if all main effects are active, the procedure will identify a set of main effects, but the procedure for arriving at that subset is compromised.

Stage 2 Methodology

In Stage 2, the factors considered depend on the Strong Heredity options. When strong heredity is selected, only second-order effects involving the factors whose main effects are identified as active in Stage 1 are considered. The Stage 2 methodology depends on the number of active main effects identified in Stage 1.

Case 1: Seven or Fewer Active Main Effects

Stage 2 uses a guided subset selection procedure. The goal is to continue to add second-order effects to the model as long as the ratio of the RMSE from Stage 2 to the RMSE from Stage 1 is greater than the specified threshold. When the ratio is less than or equal to the threshold, this indicates that there are no additional second-order effects to add to the model. The default threshold is 1. Smaller thresholds increases the number of terms likely to identified as active as compared to larger thresholds.

Note: To specify a RMSE ratio threshold other than one, select Set Stage 2 ratio from the Fit Definitive Screening red triangle menu.

For Stage 2:

• For one error degree of freedom, the threshold value is 0.20.

• For two error degrees of freedom, the threshold value is 0.10.

• For more than two error degrees of freedom, the threshold value is 0.05.

• User specified p-value is the threshold.

1. The variability for Y2nd is tested against the error estimate from Stage 1 to determine if there is additional variability due to second-order effects.

– If the p-value for this test exceeds the threshold value the procedure terminates and no active second-order effects are identified.

2. If the p-value for this test is less than or equal to the threshold value, then subsets of size k, k = 1,2,3,... are successively tested, starting with k = 1.

3. For each k, the residual sum of squares for each subset of that size is tested against the error estimate from Stage 1. The subset with the smallest RMSE is identified.

4. The procedure continues until a k is found with a ratio of RMSE to the Stage 1 RMSE smaller than the Stage 2 ratio.

5. The effects in the subset preceding the one that corresponds to the terminal value of k are considered to be the active two-factor effects.

Case 2: Eight or More Active Main Effects

Stage 2 uses forward selection for second order terms when eight or more active main effects are identified in Stage 1.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).