The convergence failure warning shows the score test for the following hypothesis: that the unknown maximum likelihood estimate (MLE) is consistent with the parameter given in the final iteration of the model-fitting algorithm. This hypothesis test is possible because the relative gradient criterion is algebraically equivalent to the score test statistic. Remarkably, the score test does not require knowledge of the true MLE.
Consider first the case of a single parameter, θ. Let l be the log-likelihood function for θ and let x be the data. The score is the derivative of the log-likelihood function with respect to θ:
This statistic has an asymptotic Chi-square distribution with 1 degree of freedom under the null hypothesis.
The score test can be generalized to multiple parameters. Consider the vector of parameters θ. Then the test statistic for the score test of H0: is:
The test statistic is asymptotically Chi-square distribution with k degrees of freedom. Here k is the number of unbounded parameters.
The convergence criterion for the Mixed Model fitting procedure is based on the relative gradient . Here, is the gradient of the log-likelihood function and is its Hessian.
Let be the value of where the algorithm terminates. Note that the relative gradient evaluated at is the score test statistic. A p-value is calculated using a Chi-square distribution with k degrees of freedom. This p-value gives an indication of whether the value of the unknown MLE is consistent with . The number of unbounded parameters listed in the Random Effects Covariance Parameter Estimates report equals k.
The standard random coefficient model specifies a random intercept and slope for each subject. Let yij denote the measurement of the jth observation on the ith subject. Then the random coefficient model can be written as follows:
You can reformulate the model to reflect the fixed and random components that are estimated by JMP as follows.
with G and defined as above.
The form of the repeated measures model is yijk = αij + sik + eijk, where
αij can be written as a treatment and time factorial
sik is the random effect of the kth subject assigned to the ith treatment
j = 1,…,m denotes the repeated measurements over time.
Assume that the sik are independent and identically distributed N(0, σs2) variables. Denote the number of treatment factors by t and the number of subjects by s. Then the distribution of eijk is N(0, Σ), where
Denote the block diagonal component of the covariance matrix Σ corresponding to the ikth subject within treatment by Σik. In other words, Σik = Var(yik|sik). Because observations over time within a subject are not typically independent, it is necessary to estimate the variance of yijk|sik. Failure to account for the correlation leads to distorted inference. The following sections describe the structures available for Σik.
The covariance between observations taken at times j and j’ is:
Observations at every time have a unique variance and observations within the same subject at every pair of distinct times have a unique covariance.
Here tj is the time of observation j. In this structure, observations taken at any given time have the same variance, . The parameter ρ, where -1 < ρ < 1, is the correlation between two observations that are one unit of time apart. As the time difference between observations increases, their covariance decreases because ρ is raised to a higher power. In many applications, AR(1) provides an adequate model of the within subject correlation, providing more power without sacrificing Type I error control.
In JMP, a compound symmetry covariance structure is implemented using the independent errors, mixed-model approach. Random effects are classified into two categories: G-side or R-side. See Searle, Casella, and McCulloch (1992) for additional details.
The G-side random effects are associated with the design matrix for random effects. The R-side random effects are associated with residual error. Within-subject variance is part of the design structure and is modeled on the G-side. Between-subject variance falls into the residual structure and is modeled R-side. In the independent structure:
The random effects G-side variance is modeled by sik ~ iid N(0, σs2).
The R-side variance is modeled by eijk ~ iid N(0, σ 2).
where J is a matrix consisting of 1s and I is an identity matrix.
Alternatively, all variance could be modeled R-side. Under the Gaussian assumption, this compound-symmetry covariance structure is equivalent to the independence model (Type=CS in SAS). This structure is not available in JMP and is listed here for informational purposes only.
Consider the simple model . The spatial or temporal structure is modeled through the error term, ei. In general, the spatial correlation model can be defined as and.
Let si denote the location of yi, where si is specified by coordinates reflecting space or time. The spatial or temporal structure is typically restricted by assuming that the covariance is a function of the Euclidean distance, dij, between si and sj. The covariance can be written as , where represents the correlation between observations yi and yj.
In the case of two or more location coordinates, if f(dij) does not depend on direction, then the covariance structure is isotropic. If it does, then the structure is anisotropic.
The correlation structures for spatial models available in JMP are shown below. These are parametrized by ρ, which is positive unless it is otherwise constrained.
For an anisotropic model, the correlation function contains a parameter, ρκ, for each direction.
When the spatial process is second-order stationary, the structures listed in define variograms. Borrowed from geostatistics, the variogram is the standard tool for describing and estimating spatial variability. It measures spatial variability as a function of the distance, dij, between observations using the semivariance.
Let Z(s) denote the value of the response at a location s. The semivariance between observations at si and sj is given as follows:
If the process is isotropic, the semivariance depends only on the distance h between points and the function can be written as follows:
Defined as the value of the semivariogram at the plateau reached for larger distances. It corresponds to the variance of an observation. In models with no nugget effect, the sill is . In models with a nugget effect, the sill is , where c1 represents the nugget. The partial sill is defined as .
Defined as the distance at which the semivariogram reaches the sill. At distances less than the range, observations are spatially correlated. For distances greater than or equal to the range, spatial correlation is effectively zero. In spherical models, ρ is the range. In exponential models, 3ρ is the practical range. In Gaussian models, is the practical range. The practical range is defined as the distance where covariance is reduced to 95% of the sill.
In , the repeated effects covariance parameter estimates represent the various semivariogram features:
For a given isotropic spatial structure, the estimated variogram is obtained using a nonlinear least squares fit of the observed data to the appropriate function in .
To compute the empirical semivariance, the distances between all pairs of points for the variables selected for the variogram covariance are computed. The range of the distances is divided into 10 equal intervals. If the data do not allow for 10 intervals, then as many intervals as possible are constructed.
Distance classes consisting of pairs of points are constructed. The hth distance class consists of all pairs of points whose distances fall in the hth interval.
distance class consisting of points whose distance falls into the hth largest interval
value of the response at x, where x is a vector of temporal or spatial coordinates
The semivariance function, γ, is defined as follows:
The variance matrix of the fixed effects is always modified to include a Kackar-Harville correction. The variance matrix of the BLUPs, and the covariances between the BLUPs and the fixed effects, are not Kackar-Harville corrected. The rationale for this approach is that corrections for BLUPs can be computationally and memory intensive when the random effects have many levels. In SAS, the Kackar-Harville correction is done for both fixed effects and BLUPs only when the DDFM=KENWARDROGER is set.
For covariance structures that have nonzero second derivatives with respect to the covariance parameters, the Kenward-Roger covariance matrix adjustment includes a second-order term. This term can result in standard error shrinkage. Also, the resulting adjusted covariance matrix can then be indefinite and is not invariant under reparameterization. The first-order Kenward-Roger covariance matrix adjustment eliminates the second derivatives from the calculation. All spatial structures and the AR(1) structure are covariance structures that generally lead to nonzero second derivatives.
The degrees of freedom for tests involving only linear combinations of fixed effect parameters are calculated using the first-order Kenward-Roger correction. So JMP’s results for these tests match PROC MIXED using the DDFM=KENWARDROGER(FIRSTORDER) option. If there are BLUPs in the linear combination, JMP uses a Satterthwaite approximation to get the degrees of freedom. The results then follow a pattern similar to what is described for standard errors in the preceding paragraph.