A Technical details

Appendix A
Technical details

This chapter explains some of the assumptions made, strategies and statistical procedures used in WOMBAT.

A.1 Ordering strategies

An essential set-up set in REML analyses is to ﬁnd a permutation of the rows and columns in the coeﬃcient matrix of the mixed model equations that makes the ‘ﬁll-in’ arising during factorisation small and thus minimises the computational eﬀorts per likelihood evaluation and REML iterate. This needs to be done only once per analysis (WOMBAT saves the results from this step for re-use in any subsequent steps). As it can have a dramatic impact on the time and memory required per analysis, it is well worth considerable eﬀort to ﬁnd the ‘best’ order. Especially for analyses involving large data sets or multiple random eﬀects, the time spend trying several, or even numerous alternatives is readily recouped within the ﬁrst few iterates [30]. WOMBAT selects a default ordering strategy based on the number of equations in the analysis.

Three diﬀerent stratgies are implemented :

1.: The multiple minimum degree procedure [25] as implemented in the widely used public domain subroutine genmmd. This is the strategy which has been used in DfReml. For WOMBAT, it is the default for small analyses involving up to 25000 equations.
2.: The approximate minimum degree ordering of Amestoy et al. [2]. This tends to produce orderings of similar quality to the multiple minimum degree procedure, but is considerably faster. Implementation is through the public domain subroutine amd (version 1.1) available at
www.cise.uﬂ.edu/research/sparse/amd. This is the default for analyses involving more than 25000 and up to 50000 equations.
3.: A multilevel nested dissection procedure, as implemented in subroutine
metis_nodend from the MeTis package (public domain) of Karypis and Kumar [21], available at at www.cs.umn.edu/∼karypis/metis. This is the default for large analyses.

A.2 Convergence citeria

WOMBAT calculates four diﬀerent criteria to determine whether an analysis has converged. The ﬁrst two are simple changes, available for all maximisation algorithms, the other two are based on the derivatives of the log likelihood function, i.e. can only be obtained for the AI algorithm. The criteria are :

1.

The increase in log likelihood values between subsequent iterates, i.e. log ℒ^t − log ℒ^t−1, with log ℒ^t the log likelihood for iterate t.

2.

The change in the vector of parameter estimates from the last iterate. This is evaluated as

┌│ -p-(--------)---p-(--)-- │∘ ∑ ˆ𝜃t− ˆ𝜃t−1 2∕∑ ˆ𝜃t2 i=1 i i i=1 i

(A.1)

where _i^t denotes the estimate of the i−th parameter from iterate t, and p is the number of parameters.

3.

The norm of the gradient vector, i.e., for g_i^t = ∂log ℒ^t∕∂𝜃_i,

┌ -------- ││ ∑p ∘ (gti)2 i=1

(A.2)

4.

The ‘Newton decrement’, i.e.

∑p ∑p − gtigtjHtij i=1j=1

(A.3)

where H_ij^t is the ij−the element of the inverse of the average information matrix for iterate t. This gives a measure of the expected diﬀerence of log ℒ^t from the maximum, and has been suggested as an alternative convergence criterion [6].

Table A.1: Default thresholds for convergence criteria

	Algorithm

Criterion	AI	(PX-) EM	Powell
Change in log ℒ	< 5 × 10⁻⁴	< 10⁻⁵	< 10⁻⁴
Change in parameters	< 10⁻⁸	< 10⁻⁸	< 10⁻⁸
Norm of gradient vector	< 10⁻³	–	–
Newton decrement	not used	–	–

Default values for the thresholds for these criteria used in WOMBAT are summarised in Table Table A.1.

N.B.: Current values used are rather stringent; ‘softer’ limits combining several criteria may be more appropriate for practical estimation.

A.3 Parameterisation

A.4 Approximation of sampling errors

A.4.1 Sampling covariances

At convergence, the inverse of the AI matrix gives estimates of lower bound sampling covariances among the parameters estimated. These are used to approximate sampling errors of covariance components and genetic parameters.

For full rank analyses parameterising to the elements of the Cholesky factors of the corresponding covariance matrices, the AI matrix is obtained by ﬁrst calculating the AI matrix for the covariance components. This is then transformed to the AI matrix for the parameters to be estimated by pre- and postmultiplying it with the corresponding Jacobian and its transpose, respectively. Full details are given in Meyer and Smith [38]. This implies that sampling covariances among the covariance components can be obtained directly by simply inverting the corresponding AI matrix.

For reduced rank analyses, however, the AI matrix for the parameters to be estimated are calculated directly, as outlined by Meyer and Kirkpatrick [35]. Hence, sampling covariances among the corresponding covariance components need to be approximated from the inverse of the AI matrix. WOMBAT estimates the leading columns of the Cholesky factor (L) of a matrix to obtain a reduced rank estimate, Σ = LL′. Let l_ir (for i ≤ r) denote the non-zero elements of L. The inverse of the AI matrix then gives approximate sampling covariances Cov (lir,ljs) . The ij−th covariance component, σ_ij, is

q∑(i,j) σij = lir ljr r=1

with q(i,j) = min(i,j,t) and t the rank which the estimate of Σ is set to have. The covariance between two covariances, σ_ij and σ_km is then

q(i,j)q(k,m) Cov (σ ,σ ) = ∑ ∑ Cov (l l ,l l ) ij kl r=1 s=1 irjr ksms

Using a ﬁrst order Taylor series approximation to the product of two variables, this can be approximated as

Equation A.4 extends readily to two covariance components belonging to diﬀerent covariance matrices, Σ₁ and Σ₂, and their respective Cholesky factors.

A.4.2 Sampling errors of genetic parameters

Let σ denote the vector of covariance components in the model of analysis and V (σ) its approximate matrix of sampling covariances, obtained as described above (subsection A.4.1). The sampling covariance for any pair of linear functions of σ is then simply

Cov(w ′σ,w′σ ) = w ′V (σ)w2 1 2 1

(A.5)

with w_i the vector of weights in linear function i.

Sampling variances of non-linear functions of covariance components are obtained by ﬁrst approximating the function by a ﬁrst order Taylor series expansion, and then calculating the variance of the resulting linear function. For example, for a variance ratio

( ) Var σ21- ≈ [σ4 Var(σ2)+ σ4Var(σ2)− 2σ2σ2Cov(σ2,σ22)]∕σ8 σ22 2 1 1 2 1 2 1 2

(A.6)

Similarly, for a correlation

A.5 Modiﬁcation of the average information matrix

To yield a search direction which is likely to improve the likelihood, or, equivalently, decrease −log ℒ, the Hessian matrix or its approximation in a Newton type optimisation strategy must be positive deﬁnite. While the AI matrix is a matrix of sums of squares and crossproducts and thus virtually guaranteed to be positive deﬁnite, it can have a relatively large condition number or minimum eigenvalues close to zero. This can yield step sizes, calculated as the product of the inverse of the AI matrix and the vector of ﬁrst derivatives, which are too large. Consequently, severe step size modiﬁcations may be required to achieve an improvement log ℒ. This may, at best, require several additional likelihood evaluations or cause the algorithm to fail. Modiﬁcation of the AI matrix, to ensure that it is ‘safely’ positive deﬁnite and that its condition number is not excessive, may improve performance of the AI algorithm in this instance.

Several strategies are available. None has been found to be ‘best’.

1.

Schnabel and Estrow [44] described a modiﬁed Cholesky decomposition of the Hessian matrix. This has been implemented using algorithm 695 of the TOMS library ( www.netlib.org/toms. This is the ) [11], but using a factor of 𝜖^−1∕2 (where 𝜖 denotes machine precision) to determine the critical size of pivots, which is intermediate to the original value of 𝜖^−2∕3 and the value of 𝜖^1∕3 suggested by Schnabel and Estrow [45].

2.

A partial Cholesky decomposition has been suggested by Forsgren et al. [14]. This has been implemented using a factor of ν = 0.998.

3.

Modiﬁcation strategies utilising the Cholesky decomposition have been devised for scenarios where direct calculation of the eigenvalues is impractical. For our applications,however, computational costs of an eigenvalue decomposition of the AI matrix are negligible compared to those of a likelihood evaluation. This allows a modiﬁcation where we know the minimum eigenvalue of the resulting matrix. Nocedahl and Wright [41, Chapter 6] described two variations, which have been implemented.

(a): Set all eigenvalues less than a value of δ to δ, and construct the modiﬁed AI matrix by pre- and postmultiplying the diagonal matrix of eigenvalues with the matrix of eigenvectors and its transpose, respectively.
(b): Add a diagonal matrix τI to the AI matrix, with τ = max(0,δ−λ_min) and λ_min the smallest eigenvalue of the AI matrix. This has been chosen as the default procedure, with δ bigger than 3 × 10⁻⁶ × λ₁, and λ₁ the largest eigenvalue of the AI matrix.

Choice of the modiﬁcation can have a substantial eﬀect on the eﬃciency of the AI algorithm. In particular, too large a modiﬁcation can slow convergence rates unnecessarily. Further experience is necessary to determine which is a good choice of modiﬁcation for speciﬁc cases.

A.6 Iterative summation of expanded part matrices

Consider the q × q covariance matrix V (among q traits) for a random eﬀect, e.g. additive genetic eﬀects. Assume we have S analyses of subsets of traits, with the s−th analysis comprising k_s traits. Further, let C_s denote the k_s × k_s matrix of estimates of covariance components for the random eﬀect from analysis s. The pooled matrix V is constructed by iterating on

∑S { ( )− ( )−
Vt+1 = ws Vt PsPs′VtPsPs ′ PsCsPs ′ PsPs′VtPsPs ′ Vt
s=1
[( ′) ( ′)]− } ∑S
+ I− PsPs (Vt)−1 I− PsPs ∕ ws
s=1

(A.8)

until V^t and V^t+1 are virtually identical, with V^t the value of V for the t−th iterate.

Other components of (Equation A.8) are w_s, the weight given to analysis s, I, an identity matrix of size q × q, and transformation matrix P_s for analysis s, of size q ×k_s. P_s has k_s elements of unity, p_ij = 1, if the i-th trait overall is the j-th trait in analysis s, and zero otherwise.

Starting values (V⁰) are obtained initially by decomposing covariance matrices C_s into variances and correlations, and calculating simple averages over individual analyses. If the resulting correlation matrix is not positive deﬁnite, it is modiﬁed by regressing all eigenvalues towards their mean, choosing a regression factor so that the smallest eigenvalue becomes 10⁻⁶, and pre- and post-multiplying the diagonal matrix of modiﬁed eigenvalues with the matrix of eigenvectors and its transpose, respectively. Using the average variances, V⁰ is then obtained from the (modiﬁed) average correlation matrix. WOMBAT is set up to perform up to 100000 iterates. Convergence is assumed to be reached when

q q ---2--- ∑ ∑ (vt+1 − vt)2 ≤ 10−7 q(q+ 1) i=1 j=i ij ij

with v_ij^t denoting the ij−th element of V^t.

[next] [prev] [prev-tail] [front] [up]

Appendix ATechnical details