7 Output generated by WOMBAT

Chapter 7
Output generated by WOMBAT

This chapter describes the ﬁles written out by WOMBAT. These comprise ‘internal’ ﬁles generated by WOMBAT for use within a run (and thus of limited interest to the user), and various output ﬁles. Most ﬁles have default names, independent of the analysis.

7.1 Main results ﬁles

These are formatted summary ﬁles to be read (rather than processed by other programs). They have the extension .out.

7.1.1 File SumPedigrees.out

If a pedigree ﬁle is speciﬁed, this ﬁle gives some summary statistics on the pedigree information found.

7.1.2 File SumModel.out

This ﬁle gives a summary about the model of analysis speciﬁed and the corresponding features of the data found. Statistics given include means, standard deviations and ranges for traits and covariables, and numbers of levels found for the eﬀects in the model of analysis.

It is written after the ﬁrst processing of the input ﬁles, i.e. during the set-up steps.

7.1.3 File SumEstimates.out

This ﬁles provides estimates of the parameters as estimated, and the resulting covariance matrices and their eigenvalues and, for reduced rank analyses, the corresponding matrices of eigenvectors. WOMBAT also writes out the corresponding matrices of correlations and variance ratios. In addition, values for any user-deﬁned functions of covariances (see section 4.12) are written out.

If the ﬁnal estimates were obtained using the AI algorithm, WOMBAT provides approximate sampling errors for the parameters and covariance components estimated, as given by the inverse of the respective average information matrices. In addition, sampling errors of variance ratios and correlations are derived, as described in subsection A.4.2.

7.1.4 File BestSoFar.out

This is an abbreviated version of SumEstimates.out, written out when the command line option --best is speciﬁed. It gives matrices of estimates pertaining to the set of parameters with the highest likelihood found so far.

7.1.5 File FixSolutions.out

This ﬁle lists the generalised least-squares solutions for all ﬁxed eﬀects ﬁtted, together with ‘raw’ means and numbers of observations for individual subclasses.

HINT: If this ﬁle is the ‘by-product’ of an estimation run using the AI-REML algorithm (default), no standard errors for ﬁxed eﬀects are given. The reason is that the AI algorithm does not involve calculation of the inverse of the coeﬃcient matrix of the mixed model equations. Asymptotic lower bound standard errors are written out, however, if the (PX-)EM algorithm is used in the last iterate performed, or if a BLUP run is speciﬁed, as both evaluate this inverse. You can force WOMBAT to calculate standard errors by supplying the option FORCE-SE in a SPECIAL block (see subsection 4.17.4). Note though that is may require additional calculations to invert the coeﬃcient matrix which can be demanding for large analyses.

7.1.6 File SumSampleAI.out

This ﬁle is only written out when specifying run option --sample (see subsection 5.2.9. It gives a brief summary of the characteristics of the analysis and average information matrix for which samples were obtained, together with means and variances across replicates. In addition, large deviations between theoretical results from the information matrix and samples obtained are ﬂagged.

7.2 Additional results

These are large ﬁles, most likely subject to further processing by other programs. Thus they contain minimum or no text. They have extension .dat.

7.2.1 File Residuals.dat

This ﬁles gives the residuals for all observations, for the model ﬁtted and current estimates of covariance components. It has the same order as the data ﬁle, and contains 3 or more space separated columns:

(a): Column 1 contains the estimated residual.
(b): Column 2 gives the corresponding predicted observation (y-hat).
(c): Column 3 gives the corresponding observation.
(d): Column 4 gives the observation adjusted for ﬁxed eﬀects for analyses with run options --blup, --solvit or --s1step. This is useful in calculating predictive accuracies of breeding values (forward cross-validation).

The ﬁrst line contains the column names. Note that this ﬁle is not produced for runs involving iterative solution of the mixed model equations.

Summary statistics about the distribution of residuals can be readily obtained using standard statistical packages. For example, the following R commands compute means, standard deviations and quartiles, and plot the two columns against each other as well as a distribution histogram for the residuals :

EXAMPLE:

res<-read.table(‘‘Residuals.dat’’) 
summary(res); sd(res) 
par(mfrow=c(1,2)) 
plot(res[,1],res[,3]); hist(res[,1])

7.2.2 File(s) RnSoln_rname.dat

Solutions for each random eﬀect are written to a separate ﬁle. These ﬁles have names RnSoln_rname.dat, with rname representing the name of the random eﬀect. Columns in these ﬁles are :

(a): Column 1 gives the running number for the level considered
(b): Column 2 gives the corresponding original code.
(c): Column 3 gives the ’eﬀect’ number, where, depending on the analysis, ’eﬀect’ is a trait, principal component or random regression coeﬃcient.
(d): Column 4 gives the solution.
(e): Column 5 gives the sampling error of the solution, calculated as the square root value of the corresponding diagonal element of the coeﬃcient matrix in the mixed model equations. This is only available, if the last iterate has been carried out using an EM or PX-EM algorithm (or if FORCE-SE has been invoked).

For genetic random eﬀects with covariance option NRM, WOMBAT calculates inbreeding coeﬃcients from the list of pedigrees speciﬁed. For such eﬀects, there may be an additional column containing these coeﬃcients (in %). This should be the last column in the RnSoln_rname.dat ﬁle.

There may be up to 7 columns – ignore column 6 unless you recognize the numbers (column 6 attempts to give accuracies of estimates, but calculations are not fully debugged; o.k. for simple models but not all cases).

If you have carried out a reduced rank analysis, i.e. give the PC option for the analysis type, the solutions in RnSoln_rname.dat pertain to the principal components! You might then also be interested in the corresponding solutions on the original scale – WOMBAT endeavours to calculate these for you and writes them to the ﬁle RnSoln_rname-tr.dat. However, if you have carried out a run in which you have calculated standard errors for the eﬀects ﬁtted, these are ignored in the back-transformation and you will ﬁnd that column 5 (in RnSoln_rname-tr.dat) consists entirely of zeros – this does not mean that these s.e. are zero, only that they have not been determined.

For multi-trait random regression analyses, regression coeﬃcients are ordered by trait, i.e. coeﬃcients 1, 2, … for traits 1 followed by coeﬃcients 1,2, … for trait 2, ….

7.2.3 File(s) Curve_cvname(_trname).dat

At convergence, curves for ﬁxed covariables ﬁtted are evaluated and written to separate ﬁles, one per covariable and trait. These have names Curve_cvname.dat for univariate analyses and Curve_cvname_ trname.dat for multivariate analyses, with cvname the name of the covariable as speciﬁed in the parameter ﬁle and, correspondingly, trname the name of the trait. Curves are only evaluated at points corresponding to nearest integer values of values found in the data. Each ﬁle has four columns :

(a): Column 1 gives the value of the covariable.
(b): Column 2 gives the point on the ﬁtted curve.
(c): Column 3 contains the number of observations with this value of the covariable.
(d): Column 4 gives the corresponding raw mean.

HINT: To get most information from these ﬁles, it might be worth your while scaling your covariables prior to analysis!

7.2.4 File(s) RanRegname.dat

For random regression analyses, WOMBAT evaluates variance components (ﬁle RanRegVariances.dat), variance ratios (ﬁle
RanRegVarRatios.dat, not written if more than one control variable is used) and selected correlations (RanRegCorrels.dat) for values of the control variable(s) occurring in the data. If approximate sampling variances of parameters are available, it is attempted to approximate the corresponding sampling errors. The general layout of the ﬁles is as follows :

(a)

Column 1 gives the running number of the value of the control variable.

(b)

Column 2 gives the corresponding actual value (omitted if more than one control variable).

(c)

The following columns give the variance components, ratios or correlations.

(i): If sampling errors are available, each source of variation is represented by two columns, i.e. value followed by the approximate lower bound sampling error, with additional spaces between ‘pairs’ of numbers.
(ii): Random eﬀects are listed in same order as the starting values for random eﬀects covariances are given in the parameter ﬁle.
(iii): If the same control variable is used for all random eﬀects, it is attempted to calculate a total, ‘phenotypic’ variance and corresponding variance ratios and correlations.
(iv): Correlations are calculated for 5 values of the control variable, corresponding to lowest and highest value, and 3 approximately equidistant intermediate values.

In addition, the ﬁles contain some rudimentary headings.

IF the special option RRCORR-ALL has been speciﬁed (see subsection 4.17.14), a ﬁle RanRegCorrAll.dat is written out in addition. This contains the following columns:

(a): The name of the random eﬀect
(b): The running number for trait one
(c): The running number for trait two
(d): The running number for the pair of traits
(e): The value of the control variable (“age”) for trait one
(f): The value of the control variable for trait two
(g): The estimated covariance between traits one and two for the speciﬁed ages
(h): The corresponding correlation

7.2.5 Files SimDatan.dat

Simulated records are written to ﬁles with the standard name
SimDatan.dat, where n is a three-digit integer value (i.e. 001, 002, …). These ﬁles have the same number of columns as speciﬁed for the data ﬁle in the parameter ﬁle (i.e. any trailing variables in the original data ﬁle not listed are ignored), with the trait values replaced by simulated records. These variables are followed by the simulated values for individual random eﬀects : The ﬁrst of these values is the residual error term, the other values are the random eﬀects as sampled (standard uni-/multivariate analyses) or as evaluated using the random regression coeﬃcients sampled - in the same order as the corresponding covariance matrices are speciﬁed in the parameter ﬁle. Except for the trait number in multivariate analyses (ﬁrst variable), all variables are written out as real values.

7.2.6 Files EstimSubSetn+…+m.dat

If an analysis considering a subset of traits is carried out, WOMBAT writes out a ﬁle EstimSubsetn+…+m.dat with the estimates of covariance matrices for this analysis. Writing of this ﬁle is ‘switched on’ when encountering the syntax “m”->n, specifying the trait number in the parameter ﬁle (see subsection 4.10.7). The ﬁrst two lines of EstimSubsetn+…+m.dat gives the following information :

(a): The number of traits in the subset and their names, as given in the parameter ﬁle.
(b): The corresponding trait numbers in the ‘full’ analysis.

This is followed by the covariance matrices estimated. The ﬁrst matrix given is the matrix of residual covariances, the other covariance matrices are given in the same order as speciﬁed in the parameter ﬁle.

(c)

The ﬁrst line for each covariance matrix gives the running number of the random eﬀect, the order of ﬁt and the name of the eﬀect

(d)

The following lines give the elements of covariance matrix, with one line per row.

The number of rows written is equal to the number of traits in the subset; for random eﬀects not ﬁtted for all traits, corresponding rows and columns of zeros are written out.

Finally, EstimSubsetn+…+m.dat gives some information on the data structure (not used subsequently) :

(e): The number of records for each trait
(f): The number of individuals with pairs of records
(g): The number of levels for the random eﬀects ﬁtted

7.2.7 Files PDMatrix.dat and PDBestPoint

These ﬁles give the pooled covariance matrices, obtained running WOMBAT with option --itsum.

PDMatrix.dat is meant to be readily pasted (as starting values) into the parameter ﬁle for an analysis considering all traits simultaneously. It contains the following information for each covariance matrix :

(a): A line with the qualiﬁer VAR, followed by the name of the random eﬀect and the order and rank of the covariance matrix.
(b): The elements of the upper triangle of the covariance matrix; these are written out as one element per line.

PDBestPoint has the same form as BestPoint (see subsection 7.3.3). It is meant to be copied (or renamed) to BestPoint, so that WOMBAT can be run with option --best to generate a ‘results’ ﬁle (BestSoFar) with correlations, variance ratios and eigenvalues of the pooled covariance matrices.

7.2.8 Files PoolEstimates.out and PoolBestPoint

These ﬁles provided the results from a run with the option --pool:

1.: PoolEstimates.out summarizes characteristics of the part estimates provided (input), options chosen, and results for all analyses carried out.
2.: PoolBestPoint is the equivalent to BestPoint. If penalized analyses are carried out, copies labelled PoolBestPoint_unpen and PoolBestPoint_txx, with xx equal to the tuning factor, are generated so that ﬁles for all sub-analyses are available at the end of the run.

7.2.9 Files MME*.dat

The two ﬁles MMECoeffMatrix.dat and MMEEqNos+Solns.dat are written out when the run option --mmeout is speciﬁed.

MMECoeffMatrix.dat

contains the non-zero elements in the lower triangle of the coeﬃcient matrix in the MME. There is one line per element, containing 3 space-separated variables:

(a): The row number (integer); in running order from 1 to N, with N the total number of equations.
(b): The column number (integer); in running order from 1 to N.
(c): The element (real).

HINT: This ﬁle is in the correct format to be inverted using run option --invert or --invrev.

MMEEqNos+Solns.dat: provides the mapping of equation numbers (1 to N) to eﬀects in the model, as well as the right hand sides and solutions. This ﬁle has one line per equation, with the following, space separated variables:
(a): The equation number (integer).
(b): The name of the trait, truncated to 12 letters for long names.
(c): The name of the eﬀect, truncated to 12 letters (For random eﬀects and analyses using the PC option, this is replaced by PCn).
(d): The original code for this level and eﬀect (integer); for covariables this is replaced by the ‘set number’ (= 1 for non-nested covariables).
(e): The running number for this level (within eﬀect).
(f): The diagonal element of the coeﬃcient matrix in the MME (real).
(g): The right hand side in the MME (real).
(h): The solution for this eﬀect (real).

7.2.10 File SNPSolutions.dat and SNPCovariances.dat

This is the output ﬁle for a run using option --snap. It contains one line for each line found in the input ﬁle SNPCounts.dat containing

(a): The estimated SNP eﬀect (regression coeﬃcient).
(b): Its standard error (from the inverse of the coeﬃcient matrix).
(c): The t−value, i.e. the ratio of the two variables.
(d): A character variable with a name for the line.

SNPCovariances.dat is generated only when multiple SNPs are considered simultaneously and the option COVOUT is speciﬁed (see subsection 4.17.7). This causes the upper triangle of the matrix of sampling covariances to be written to this ﬁle (n(n + 1)∕2 elements for n SNPs, space separated).

7.2.11 Files Pen*(.dat) and ValidateLogLike.dat

The following ﬁles are created in conjunction with penalized estimation. Some can be used by WOMBAT in additional steps.

7.2.11.1 File PenEstimates.dat

This ﬁle gives a brief summary of estimates together with log likelihood values for all values of the tuning parameter given.

7.2.11.2 File PenBestPoints.dat

This ﬁle collects the BestPoint’s for all tuning parameters. The format for each is similar to that for BestPoint (see subsection 7.3.3), except that the ﬁrst line has 3 entries comprising the tuning factor, the maximum penalized likelihood and the corresponding unpenalized value. It is suitable as input for additional calculations aimed at comparing estimates, and used as input ﬁle for ‘validation’ runs (see run option --valid, subsection 5.3.2).

Output to this ﬁle is cumulative, i.e. if it exists in the working directory it is not over-written but appended to.

7.2.11.3 File PenCanEigvalues.dat

Similarly, this ﬁle collects the values of the tuning factors and corresponding penalized and unpenalized log likelihood values. It has one line for each tuning factor. For a penalty on the canonical eigenvalues, estimates of the latter are written out as well (in descending order). Again, if this ﬁle exists in the working directory it is not over-written but appended to.

7.2.11.4 File PenTargetMatrix

If the option PHENV (see subsection 4.17.16) is speciﬁed, WOMBAT writes out this ﬁle with a suitable target matrix. For penalty COVARM a covariance matrix and for CORREL the corresponding correlation matrix is given. For a multivariate analysis ﬁtting a simple animal model, this is the phenotypic covariance (correlation) matrix. For a random regression analysis, corresponding matrices are based on the sum of the covariance matrices among random regression coeﬃcients due the two random eﬀects ﬁtted, assumed to represent individuals’ genetic and permanent environmental eﬀects (which must be ﬁtted using the same number of basis functions).

Written out is the upper triangle of the matrix.

7.2.11.5 File ValidateLogLike.dat

This ﬁle is the output resulting from a run with the option --valid. It contains one line per tuning factor with he following entries:

(a): A running number
(b): The tuning factor
(c): The unpenalized log likelihood in the validation data.
(d): The penalized log likelihood in the training data.
(e): The unpenalized log likelihood in the training data.

If this ﬁle exists in the working directory it is not over-written but appended to.

7.2.12 File CovSamples_name.dat

This (potentially large) ﬁle contains the samples drawn from the multivariate normal distribution of estimates, either for all random eﬀects in the analysis (name = ALL) or for single, selected eﬀect. The ﬁle contains one line per replicate, with covariance matrices written in the same sequence as in the corresponding estimation run (for ALL), giving the upper triangle for each matrix.

7.3 ‘Utility’ ﬁles

In addition, WOMBAT produces a number of small ‘utility’ ﬁles. These serve to monitor progress during estimation, to carry information over to subsequent runs or to facilitate specialised post-estimation calculations by the user.

7.3.1 File ListOfCovs

This ﬁle lists the covariance components deﬁned by the model of analysis, together with their running numbers and starting values given. It is written out during the ‘set-up’ phase (see subsection 5.2.3). It can be used to identify the running numbers needed when deﬁning additional functions of covariance components to be evaluated (see section 4.12)

7.3.2 File RepeatedRecordsCounts

This ﬁle gives a count of the numbers of repeated records per trait and, if option TSELECT is used, a count of the number of pairs taken at the same time.

7.3.3 File BestPoint

Whenever WOMBAT encounters a set of parameters which improves the likelihood, the currently ‘best’ point is written out to the ﬁle BestPoint.

The ﬁrst line of BestPoint gives the following information :

(a): The current value of the log likelihood,
(b): The number of parameters

This is followed by the covariance matrices estimated.

1.: Only the upper triangle, written out row-wise, is given.
2.: Each covariance matrix starts on a new line. ‘
3.: The ﬁrst matrix given is the matrix of residual covariances. The other covariance matrices are given in the same order as the matrices of starting values were speciﬁed in the parameter ﬁle.

N.B.: BestPoint is used in any continuation or post-estimation steps – do not delete is until the analysis is complete !

7.3.4 File Iterates

WOMBAT appends a line of summary information to the ﬁle Iterates on completion of an iterate of the AI, PX-EM or EM algorithm. This can be used to monitor the progress of an estimation run – useful for long runs in background mode. Each line gives the following information :

(a): Column 1 gives a two-letter identifying the algorithms (AI, PX, EM) used.
(b): Column 2 gives the running number of the iterate.
(c): Column 3 contains the log likelihood value at the end of the iterate.
(d): Column 4 gives the change in log likelihood from the previous iterate.
(e): Column 5 shows the norm of the vector of ﬁrst derivatives of the log likelihood (zero for PX and EM)
(f): Column 6 gives the norm of the vector of changes in the parameters, divided by the norm of the vector of parameters.
(g): Column 7 gives the Newton decrement (absolute value) for the iterate (zero for PX and EM).
(h): Column 8 shows a) the step size factor used for AI steps, b) the average squared deviation of the matrices of additional parameters in the PX-EM algorithm from the identity matrix, c) zero for EM steps.
(i): Column 9 gives the CPU time for the iterate in seconds
(j): Column 10 gives the number of likelihood evaluations carried out so far.
(k): Column 11 gives the factor used to ensure that the average information matrix is ‘safely’ positive deﬁnite
(l): Column 12 identiﬁes the method used to modify the average information matrix (0: no modiﬁcation, 1: modify eigenvalues directly, 2: add diagonal matrix, 3: modiﬁed Cholesky decomposition, 4: partial Cholesky decomposition – see section A.5).

7.3.5 File OperationCounts

This small ﬁle gathers accumulates the number of non-zero elements in the Cholesky factor (or inverse) of the mixed model matrix together with the resulting operation count for the factorisation. This can be used to compare the eﬃcacy of diﬀerent ordering strategies for a particular analysis. The ﬁle contains one line per ordering tried, with the following information :

(a): The name of the ordering strategy (mmd, amd or metis).
(b): For metis only : the values of the three options which can be set by the user, i.e. the number of graph separators used, the ‘density’ factor, and the option selecting the edge matching strategy (see subsection 5.3.1).
(c): The number of non-zero elements in the mixed model matrix after factorisation.
(d): The operation count for the factorisation.

7.3.6 Files AvInfoParms and AvinfoCovs

These ﬁles are written out when the AI algorithm is used. After each iterate, they give the average information matrix (not its inverse !) corresponding to the ‘best’ estimates obtained by an AI step, as written out to BestPoint. These can be used to approximate sampling variances and errors of genetic parameters.

N.B.: If the AI iterates are followed by further estimates steps using a diﬀerent algorithm, the average information matrices given may not pertain to the ‘best’ estimates any longer.

AvInfoParms contains the average information matrix for the parameters estimated. Generally, the parameters are the elements of the leading columns of the Cholesky factors of the covariance matrices estimated. This ﬁle is written out for both full and reduced rank estimation.

For full rank estimation, the average information is ﬁrst calculated with respect to the covariance components and then transformed to the Cholesky scale. Hence, the average information for the covariances is available directly, and is written to the ﬁle AvInfoParms.

Both ﬁles give the elements of the upper triangle of the symmetric information matrix row-wise. The ﬁrst line gives the log likelihood value for the estimates to which the matrix pertains – this can be used to ensure corresponding ﬁles of estimates and average information are used. Each of the following lines in the ﬁle represents one element of the matrix, containing 3 variables :

(a): row number,
(b): column number, and
(c): element of the average information matrix.

N.B.: Written out are the information matrices for all parameters. If some parameters (or covariances) are not estimated (such as zero residual covariances for traits measured on diﬀerent animals), the corresponding rows and columns may be zero.

7.3.7 Files Covariable.baf

For random regression analyses, ﬁle(s) with the basis functions evaluated for the values of the control variable(s) in the data are written out. These can be used, for example, in calculating covariances of predicted random eﬀects at speciﬁc points.

The name of a ﬁle is equal to the name of the covariable (or ‘control’ variable), as given in the parameter ﬁle (model of analysis part), followed by the option describing the form of basis function (POL, LEG, BSP; see subsection 4.10.2) the maximum number of coeﬃcients, and the extension .baf. The ﬁle then contains one row for each value of the covariable, giving the covariable, followed by the coeﬃcients of the basis function.

NB. These ﬁles pertain to the random regressions ﬁtted! Your model may contain a ﬁxed regression on the same covariable with the same number of speciﬁed regression coeﬃcients, n, but with intercept omitted. If so, the coeﬃcients in this ﬁle are not appropriate to evaluate the ﬁxed regression curve.

7.3.8 File LogL4Quapprox.dat

For analyses involving an additional parameter, the values used for the parameter and the corresponding maximum log likelihoods are collected in this ﬁle. This is meant facilitate estimation of the parameter through a quadratic approximation of the resulting proﬁle likelihood curve via the run option --quapp. Note that this ﬁle is appended to at each run.

7.3.9 File SubSetsList

If analyses considering a subset of traits are carried out, WOMBAT writes out ﬁles EstimSubsetn + … + m.dat (see subsection 7.2.6), to be used as input ﬁles in a run with option --itsum or --pool. In addition, for each run performed, this ﬁle name this appended to SubSetsList. This ﬁle contains one line per ‘partial’ run with two entries: the ﬁle name (EstimSubsetn + … + m.dat) and a weight given to the corresponding results when combining estimates. The default for the weight is unity.

7.4 Miscellaneous

7.4.1 File ReducedPedFile.dat

As one of the ﬁrst steps in analyses ﬁtting an animal model, WOMBAT checks the pedigree ﬁle supplied against the data ﬁle and, if applicable, deletes any individuals not linked to the data in any way. The new, reduced pedigree is written to this ﬁle. Like the original pedigree ﬁle, it contains 3 columns, i.e. codes for animal, sire and dam.

7.4.2 Files PrunedPedFilen.dat

The second ‘pedigree modiﬁcation’ carried out is ‘pruning’. This is performed for each genetic eﬀect separately (provided they are assumed to be uncorrelated). The corresponding pedigrees are written to these ﬁles, with n = 1,2,… pertaining to the order in which these eﬀects are speciﬁed in the model part of the parameter ﬁle. Each has 7 columns: columns 1 to give the animal, sire and dam recoded in running order, columns 4 to 6 give the corresponding original identities, and column 7 contains the inbreeding coeﬃcient of the individual.

7.4.3 File WOMBAT.log

This ﬁle collects ‘time stamps’ for various stages of a run, together explanatory messages for programmed stops. The content largely duplicates what is written to the screen. However, the ﬁle is closed after any message written – in contrast to a capture of the screen output which, under Linux, may be incomplete. It is intended for those running a multitude of analyses and wanting to trace hat went on. N.B. This ﬁle is appended too, i.e. will collect information for multiple runs in the current directory if not deleted between runs.

[next] [prev] [prev-tail] [front] [up]

Chapter 7Output generated by WOMBAT