Profile confidence limits all start with a goal SSE. This is a sum of squared errors (or sum of loss function) that an F test considers significantly different from the solution SSE at the given alpha level. If the loss function is specified to be a negative log-likelihood, then a Chi-square quantile is used instead of an F quantile. For each parameter’s upper confidence limit, the parameter value is increased until the SSE reaches the goal SSE. As the parameter value is moved up, all the other parameters are adjusted to be least squares estimates subject to the change in the profiled parameter. Conceptually, this is a compounded set of nested iterations. Internally there is a way to do this with one set of iterations developed by Johnston and DeLong. See SAS/STAT 9.1 vol. 3 pp. 1666-1667.
Diagram of Confidence Limits for Parameters shows the contour of the goal SSE or negative likelihood, with the least squares (or least loss) solution inside the shaded region:
Diagram of Confidence Limits for Parameters
Suppose that f(β) is the model. Then the Nonlinear platform attempts to minimize the sum of the loss functions written as
The second term is probably small if ρ is the squared residual because the sum of residuals is small. The term is zero if there is an intercept term. For least squares, this is the term that distinguishes Gauss-Newton from Newton-Raphson. In JMP, the second term is calculated only if the Second Derivative option is checked.
If you open the Negative Exponential.jmp nonlinear sample data example, the actual formula for the Nonlinear column looks something like this:
The Derivative of Model with respect to the parameters is:
When the prediction model needs additional subexpressions evaluated, it uses the First function, which returns the value of the first argument expression, and also evaluates the other arguments. In this case additional assignments are needed for derivatives.
The derivative table itself is a list of expressions, one expression for each parameter to be fit. For example, the derivative of the model with respect to b0 is T#1; its thread in the prediction model is 1–(Exp(-b1*X)). The derivative with respect to b1 is T#3*b0, which is –(–1*Exp(-b1*X)*X)*b0 if you substitute in the assignments above. Although many optimizations are made, it does not always combine the operations optimally. You can see this by the expression for T#3, which does not remove a double negation.
If you ask for second derivatives, then you get a list of (m(m + 1))/2 second derivative expressions in a list, where m is the number of parameters.
If the derivative mechanism does not know how to take the analytic derivative of a function, then it takes numerical derivatives, using the NumDeriv function. If this occurs, the platform shows the delta that it used to evaluate the change in the function with respect to a delta change in the arguments. You might need to experiment with different delta settings to obtain good numerical derivatives.
a1 + b1x + c1x2
There is really no one omnibus optimization method that works well on all problems. JMP has options like Newton, QuasiNewton BFGS, QuasiNewton SR1, and Numeric Derivatives Only to expand the range of problems that are solvable by the Nonlinear Platform.
Some models are very sensitive to starting values of the parameters. Working on new starting values is often effective. Edit the starting values and click Reset to see the effect. The plot often helps. Use the sliders to visually modify the curve to fit better. The parameter profilers can help, but might be too slow for anything but small data sets.