This chapter explains some of the assumptions made, strategies and statistical procedures used in WOMBAT.
An essential set-up set in REML analyses is to find a permutation of the rows and columns in the coefficient matrix of the mixed model equations that makes the ‘fill-in’ arising during factorisation small and thus minimises the computational efforts per likelihood evaluation and REML iterate. This needs to be done only once per analysis (WOMBAT saves the results from this step for re-use in any subsequent steps). As it can have a dramatic impact on the time and memory required per analysis, it is well worth considerable effort to find the ‘best’ order. Especially for analyses involving large data sets or multiple random effects, the time spend trying several, or even numerous alternatives is readily recouped within the first few iterates [31]. WOMBAT selects a default ordering strategy based on the number of equations in the analysis.
Three different stratgies are implemented :
WOMBAT calculates four different criteria to determine whether an analysis has converged. The first two are simple changes, available for all maximisation algorithms, the other two are based on the derivatives of the log likelihood function, i.e. can only be obtained for the AI algorithm. The criteria are :
(A.1) |
where 𝜃it denotes the estimate of the i−th parameter from iterate t, and p is the number of parameters.
(A.2) |
(A.3) |
where Hijt is the ij−the element of the inverse of the average information matrix for iterate t. This gives a measure of the expected difference of log ℒt from the maximum, and has been suggested as an alternative convergence criterion [6].
Algorithm
| |||
Criterion | AI | (PX-) EM | Powell |
Change in log ℒ | < 5 × 10−4 | < 10−5 | < 10−4 |
Change in parameters | < 10−8 | < 10−8 | < 10−8 |
Norm of gradient vector | < 10−3 | – | – |
Newton decrement | not used | – | – |
Default values for the thresholds for these criteria used in WOMBAT are summarised in Table Table A.1.
N.B.: Current values used are rather stringent; ‘softer’ limits combining several criteria may be more appropriate for practical estimation.
At convergence, the inverse of the AI matrix gives estimates of lower bound sampling covariances among the parameters estimated. These are used to approximate sampling errors of covariance components and genetic parameters.
For full rank analyses parameterising to the elements of the Cholesky factors of the corresponding covariance matrices, the AI matrix is obtained by first calculating the AI matrix for the covariance components. This is then transformed to the AI matrix for the parameters to be estimated by pre- and postmultiplying it with the corresponding Jacobian and its transpose, respectively. Full details are given in Meyer and Smith [39]. This implies that sampling covariances among the covariance components can be obtained directly by simply inverting the corresponding AI matrix.
For reduced rank analyses, however, the AI matrix for the parameters to be estimated are calculated directly, as outlined by Meyer and Kirkpatrick [36]. Hence, sampling covariances among the corresponding covariance components need to be approximated from the inverse of the AI matrix. WOMBAT estimates the leading columns of the Cholesky factor (L) of a matrix to obtain a reduced rank estimate, Σ = LL′. Let lir (for i ≤ r) denote the non-zero elements of L. The inverse of the AI matrix then gives approximate sampling covariances Cov. The ij−th covariance component, σij, is
|
with q(i,j) = min(i,j,t) and t the rank which the estimate of Σ is set to have. The covariance between two covariances, σij and σkm is then
|
Using a first order Taylor series approximation to the product of two variables, this can be approximated as
Equation A.4 extends readily to two covariance components belonging to different covariance matrices, Σ1 and Σ2, and their respective Cholesky factors.
Let σ denote the vector of covariance components in the model of analysis and V (σ) its approximate matrix of sampling covariances, obtained as described above (subsection A.4.1). The sampling covariance for any pair of linear functions of σ is then simply
(A.5) |
with wi the vector of weights in linear function i.
Sampling variances of non-linear functions of covariance components are obtained by first approximating the function by a first order Taylor series expansion, and then calculating the variance of the resulting linear function. For example, for a variance ratio
(A.6) |
Similarly, for a correlation
To yield a search direction which is likely to improve the likelihood, or, equivalently, decrease −log ℒ, the Hessian matrix or its approximation in a Newton type optimisation strategy must be positive definite. While the AI matrix is a matrix of sums of squares and crossproducts and thus virtually guaranteed to be positive definite, it can have a relatively large condition number or minimum eigenvalues close to zero. This can yield step sizes, calculated as the product of the inverse of the AI matrix and the vector of first derivatives, which are too large. Consequently, severe step size modifications may be required to achieve an improvement log ℒ. This may, at best, require several additional likelihood evaluations or cause the algorithm to fail. Modification of the AI matrix, to ensure that it is ‘safely’ positive definite and that its condition number is not excessive, may improve performance of the AI algorithm in this instance.
Several strategies are available. None has been found to be ‘best’.
Choice of the modification can have a substantial effect on the efficiency of the AI algorithm. In particular, too large a modification can slow convergence rates unnecessarily. Further experience is necessary to determine which is a good choice of modification for specific cases.
Consider the q × q covariance matrix V (among q traits) for a random effect, e.g. additive genetic effects. Assume we have S analyses of subsets of traits, with the s−th analysis comprising ks traits. Further, let Cs denote the ks × ks matrix of estimates of covariance components for the random effect from analysis s. The pooled matrix V is constructed by iterating on
(A.8) |
until Vt and Vt+1 are virtually identical, with Vt the value of V for the t−th iterate.
Other components of (Equation A.8) are ws, the weight given to analysis s, I, an identity matrix of size q × q, and transformation matrix Ps for analysis s, of size q ×ks. Ps has ks elements of unity, pij = 1, if the i-th trait overall is the j-th trait in analysis s, and zero otherwise.
Starting values (V0) are obtained initially by decomposing covariance matrices Cs into variances and correlations, and calculating simple averages over individual analyses. If the resulting correlation matrix is not positive definite, it is modified by regressing all eigenvalues towards their mean, choosing a regression factor so that the smallest eigenvalue becomes 10−6, and pre- and post-multiplying the diagonal matrix of modified eigenvalues with the matrix of eigenvectors and its transpose, respectively. Using the average variances, V0 is then obtained from the (modified) average correlation matrix. WOMBAT is set up to perform up to 100000 iterates. Convergence is assumed to be reached when
|
with vijt denoting the ij−th element of Vt.