A.5 Modification of the average information matrix

To yield a search direction which is likely to improve the likelihood, or, equivalently, decrease − logℒ , the Hessian matrix or its approximation in a Newton type optimisation strategy must be positive definite. While the AI matrix is a matrix of sums of squares and crossproducts and thus virtually guaranteed to be positive definite, it can have a relatively large condition number or minimum eigenvalues close to zero. This can yield step sizes, calculated as the product of the inverse of the AI matrix and the vector of first derivatives, which are too large. Consequently, severe step size modifications may be required to achieve an improvement logℒ . This may, at best, require several additional likelihood evaluations or cause the algorithm to fail. Modification of the AI matrix, to ensure that it is ‘safely’ positive definite and that its condition number is not excessive, may improve performance of the AI algorithm in this instance.

Several strategies are available. None has been found to be ‘best’.

1.
Schnabel and Estrow [29] described a modified Cholesky decomposition of the Hessian matrix. This has been implemented using algorithm 695 of the TOMS library (www.netlib.org/toms. This is the ) [5], but using a factor of ϵ−1∕2  (where ϵ  denotes machine precision) to determine the critical size of pivots, which is intermediate to the original value of ϵ−2∕3  and the value of ϵ1∕3  suggested by Schnabel and Estrow [30].
2.
A partial Cholesky decomposition has been suggested by Forsgren et al. [8]. This has been implemented using a factor of ν = 0.998  .
3.
Modification strategies utilising the Cholesky decomposition have been devised for scenarios where direct calculation of the eigenvalues is impractical. For our applications,however, computational costs of an eigenvalue decomposition of the AI matrix are negligible compared to those of a likelihood evaluation. This allows a modification where we know the minimum eigenvalue of the resulting matrix. Nocedahl and Wright [26, Chapter 6] described two variations, which have been implemented.
(a)
Set all eigenvalues less than a value of δ  to δ  , and construct the modified AI matrix by pre- and postmultiplying the diagonal matrix of eigenvalues with the matrix of eigenvectors and its transpose, respectively.
(b)
Add a diagonal matrix τI  to the AI matrix, with τ = max (0,δ − λmin)  and λmin  the smallest eigenvalue of the AI matrix. This has been chosen as the default procedure, with δ  bigger than      −6
3× 10   × λ1  , and λ1  the largest eigenvalue of the AI matrix.

Choice of the modification can have a substantial effect on the efficiency of the AI algorithm. In particular, too large a modification can slow convergence rates unnecessarily. Further experience is necessary to determine which is a good choice of modification for specific cases.