Appendix A
Technical details



This chapter explains some of the assumptions made, strategies and statistical procedures used in WOMBAT.

A.1 Ordering strategies

An essential set-up set in REML analyses is to find a permutation of the rows and columns in the coefficient matrix of the mixed model equations that makes the ‘fill-in’ arising during factorisation small and thus minimises the computational efforts per likelihood evaluation and REML iterate. This needs to be done only once per analysis (WOMBAT saves the results from this step for re-use in any subsequent steps). As it can have a dramatic impact on the time and memory required per analysis, it is well worth considerable effort to find the ‘best’ order. Especially for analyses involving large data sets or multiple random effects, the time spend trying several, or even numerous alternatives is readily recouped within the first few iterates [31]. WOMBAT selects a default ordering strategy based on the number of equations in the analysis.

Three different stratgies are implemented :

1.
The multiple minimum degree procedure [26] as implemented in the widely used public domain subroutine genmmd. This is the strategy which has been used in DfReml. For WOMBAT, it is the default for small analyses involving up to 25000 equations.
2.
The approximate minimum degree ordering of Amestoy et al. [2]. This tends to produce orderings of similar quality to the multiple minimum degree procedure, but is considerably faster. Implementation is through the public domain subroutine amd (version 1.1) available at
www.cise.ufl.edu/research/sparse/amd. This is the default for analyses involving more than 25000 and up to 50000 equations.
3.
A multilevel nested dissection procedure, as implemented in subroutine
metis_nodend from the MeTis package (public domain) of Karypis and Kumar [22], available at at www.cs.umn.edu/karypis/metis. This is the default for large analyses.

A.2 Convergence citeria

WOMBAT calculates four different criteria to determine whether an analysis has converged. The first two are simple changes, available for all maximisation algorithms, the other two are based on the derivatives of the log likelihood function, i.e. can only be obtained for the AI algorithm. The criteria are :

1.
The increase in log likelihood values between subsequent iterates, i.e. log t log t1, with log t the log likelihood for iterate t.
2.
The change in the vector of parameter estimates from the last iterate. This is evaluated as
┌│ -p-(--------)---p-(--)--
│∘ ∑   ˆ𝜃t− ˆ𝜃t−1 2∕∑    ˆ𝜃t2
  i=1  i   i      i=1  i
(A.1)

where 𝜃it denotes the estimate of the ith parameter from iterate t, and p is the number of parameters.

3.
The norm of the gradient vector, i.e., for git = log t∕∂𝜃i,
┌ --------
││ ∑p
∘    (gti)2
  i=1
(A.2)

4.
The ‘Newton decrement’, i.e.
 ∑p ∑p
−      gtigtjHtij
 i=1j=1
(A.3)

where Hijt is the ijthe element of the inverse of the average information matrix for iterate t. This gives a measure of the expected difference of log t from the maximum, and has been suggested as an alternative convergence criterion [6].

Table A.1: Default thresholds for convergence criteria
Algorithm
Criterion AI (PX-) EM Powell
Change in log < 5 × 104 < 105 < 104
Change in parameters < 108 < 108 < 108
Norm of gradient vector < 103
Newton decrement not used

Default values for the thresholds for these criteria used in WOMBAT are summarised in Table Table A.1.

N.B.: Current values used are rather stringent; ‘softer’ limits combining several criteria may be more appropriate for practical estimation.

A.3 Parameterisation

A.4 Approximation of sampling errors

A.4.1 Sampling covariances

At convergence, the inverse of the AI matrix gives estimates of lower bound sampling covariances among the parameters estimated. These are used to approximate sampling errors of covariance components and genetic parameters.

For full rank analyses parameterising to the elements of the Cholesky factors of the corresponding covariance matrices, the AI matrix is obtained by first calculating the AI matrix for the covariance components. This is then transformed to the AI matrix for the parameters to be estimated by pre- and postmultiplying it with the corresponding Jacobian and its transpose, respectively. Full details are given in Meyer and Smith [39]. This implies that sampling covariances among the covariance components can be obtained directly by simply inverting the corresponding AI matrix.

For reduced rank analyses, however, the AI matrix for the parameters to be estimated are calculated directly, as outlined by Meyer and Kirkpatrick [36]. Hence, sampling covariances among the corresponding covariance components need to be approximated from the inverse of the AI matrix. WOMBAT estimates the leading columns of the Cholesky factor (L) of a matrix to obtain a reduced rank estimate, Σ = LL. Let lir (for i r) denote the non-zero elements of L. The inverse of the AI matrix then gives approximate sampling covariances Cov(lir,ljs). The ijth covariance component, σij, is

     q∑(i,j)
σij =     lir ljr
     r=1

with q(i,j) = min(i,j,t) and t the rank which the estimate of Σ is set to have. The covariance between two covariances, σij and σkm is then

             q(i,j)q(k,m)
Cov (σ  ,σ ) = ∑   ∑   Cov (l l ,l l  )
     ij  kl   r=1 s=1      irjr ksms

Using a first order Taylor series approximation to the product of two variables, this can be approximated as

pict

Equation A.4 extends readily to two covariance components belonging to different covariance matrices, Σ1 and Σ2, and their respective Cholesky factors.

A.4.2 Sampling errors of genetic parameters

Let σ denote the vector of covariance components in the model of analysis and V (σ) its approximate matrix of sampling covariances, obtained as described above (subsection A.4.1). The sampling covariance for any pair of linear functions of σ is then simply

Cov(w ′σ,w′σ ) = w ′V (σ)w2
      1   2       1
(A.5)

with wi the vector of weights in linear function i.

Sampling variances of non-linear functions of covariance components are obtained by first approximating the function by a first order Taylor series expansion, and then calculating the variance of the resulting linear function. For example, for a variance ratio

   (   )
Var  σ21- ≈ [σ4 Var(σ2)+ σ4Var(σ2)− 2σ2σ2Cov(σ2,σ22)]∕σ8
     σ22      2     1    1     2     1 2     1        2
(A.6)

Similarly, for a correlation

pict

A.5 Modification of the average information matrix

To yield a search direction which is likely to improve the likelihood, or, equivalently, decrease log , the Hessian matrix or its approximation in a Newton type optimisation strategy must be positive definite. While the AI matrix is a matrix of sums of squares and crossproducts and thus virtually guaranteed to be positive definite, it can have a relatively large condition number or minimum eigenvalues close to zero. This can yield step sizes, calculated as the product of the inverse of the AI matrix and the vector of first derivatives, which are too large. Consequently, severe step size modifications may be required to achieve an improvement log . This may, at best, require several additional likelihood evaluations or cause the algorithm to fail. Modification of the AI matrix, to ensure that it is ‘safely’ positive definite and that its condition number is not excessive, may improve performance of the AI algorithm in this instance.

Several strategies are available. None has been found to be ‘best’.

1.
Schnabel and Estrow [45] described a modified Cholesky decomposition of the Hessian matrix. This has been implemented using algorithm 695 of the TOMS library (www.netlib.org/toms. This is the ) [11], but using a factor of 𝜖12 (where 𝜖 denotes machine precision) to determine the critical size of pivots, which is intermediate to the original value of 𝜖23 and the value of 𝜖13 suggested by Schnabel and Estrow [46].
2.
A partial Cholesky decomposition has been suggested by Forsgren et al. [15]. This has been implemented using a factor of ν = 0.998.
3.
Modification strategies utilising the Cholesky decomposition have been devised for scenarios where direct calculation of the eigenvalues is impractical. For our applications,however, computational costs of an eigenvalue decomposition of the AI matrix are negligible compared to those of a likelihood evaluation. This allows a modification where we know the minimum eigenvalue of the resulting matrix. Nocedahl and Wright [42, Chapter 6] described two variations, which have been implemented.
(a)
Set all eigenvalues less than a value of δ to δ, and construct the modified AI matrix by pre- and postmultiplying the diagonal matrix of eigenvalues with the matrix of eigenvectors and its transpose, respectively.
(b)
Add a diagonal matrix τI to the AI matrix, with τ = max(0λmin) and λmin the smallest eigenvalue of the AI matrix. This has been chosen as the default procedure, with δ bigger than 3 × 106 × λ1, and λ1 the largest eigenvalue of the AI matrix.

Choice of the modification can have a substantial effect on the efficiency of the AI algorithm. In particular, too large a modification can slow convergence rates unnecessarily. Further experience is necessary to determine which is a good choice of modification for specific cases.

A.6 Iterative summation of expanded part matrices

Consider the q × q covariance matrix V  (among q traits) for a random effect, e.g. additive genetic effects. Assume we have S analyses of subsets of traits, with the sth analysis comprising ks traits. Further, let Cs denote the ks × ks matrix of estimates of covariance components for the random effect from analysis s. The pooled matrix V  is constructed by iterating on

      ∑S    {   (            )−         (            )−
Vt+1 =    ws Vt  PsPs′VtPsPs ′  PsCsPs ′ PsPs′VtPsPs ′  Vt
       s=1
                        [(       ′)      (        ′)]− } ∑S
                      +   I− PsPs  (Vt)−1 I− PsPs      ∕   ws
                                                        s=1
(A.8)

until Vt and Vt+1 are virtually identical, with Vt the value of V  for the tth iterate.

Other components of (Equation A.8) are ws, the weight given to analysis s, I, an identity matrix of size q × q, and transformation matrix Ps for analysis s, of size q ×ks. Ps has ks elements of unity, pij = 1, if the i-th trait overall is the j-th trait in analysis s, and zero otherwise.

Starting values (V0) are obtained initially by decomposing covariance matrices Cs into variances and correlations, and calculating simple averages over individual analyses. If the resulting correlation matrix is not positive definite, it is modified by regressing all eigenvalues towards their mean, choosing a regression factor so that the smallest eigenvalue becomes 106, and pre- and post-multiplying the diagonal matrix of modified eigenvalues with the matrix of eigenvectors and its transpose, respectively. Using the average variances, V0 is then obtained from the (modified) average correlation matrix. WOMBAT is set up to perform up to 100000 iterates. Convergence is assumed to be reached when

         q  q
---2--- ∑  ∑  (vt+1 − vt)2 ≤ 10−7
q(q+ 1) i=1 j=i  ij    ij

with vijt denoting the ijth element of Vt.