Chapter 7
Output generated by WOMBAT



This chapter describes the files written out by WOMBAT. These comprise ‘internal’ files generated by WOMBAT for use within a run (and thus of limited interest to the user), and various output files. Most files have default names, independent of the analysis.

7.1 Main results files

These are formatted summary files to be read (rather than processed by other programs). They have the extension .out.

7.1.1 File SumPedigrees.out

If a pedigree file is specified, this file gives some summary statistics on the pedigree information found.

7.1.2 File SumModel.out

This file gives a summary about the model of analysis specified and the corresponding features of the data found. Statistics given include means, standard deviations and ranges for traits and covariables, and numbers of levels found for the effects in the model of analysis.

It is written after the first processing of the input files, i.e. during the set-up steps.

7.1.3 File SumEstimates.out

This files provides estimates of the parameters as estimated, and the resulting covariance matrices and their eigenvalues and, for reduced rank analyses, the corresponding matrices of eigenvectors. WOMBAT also writes out the corresponding matrices of correlations and variance ratios. In addition, values for any user-defined functions of covariances (see section 4.12) are written out.

If the final estimates were obtained using the AI algorithm, WOMBAT provides approximate sampling errors for the parameters and covariance components estimated, as given by the inverse of the respective average information matrices. In addition, sampling errors of variance ratios and correlations are derived, as described in subsection A.4.2.

7.1.4 File BestSoFar.out

This is an abbreviated version of SumEstimates.out, written out when the command line option --best is specified. It gives matrices of estimates pertaining to the set of parameters with the highest likelihood found so far.

7.1.5 File FixSolutions.out

This file lists the generalised least-squares solutions for all fixed effects fitted, together with ‘raw’ means and numbers of observations for individual subclasses.

HINT: If this file is the ‘by-product’ of an estimation run using the AI-REML algorithm (default), no standard errors for fixed effects are given. The reason is that the AI algorithm does not involve calculation of the inverse of the coefficient matrix of the mixed model equations. Asymptotic lower bound standard errors are written out, however, if the (PX-)EM algorithm is used in the last iterate performed, or if a BLUP run is specified, as both evaluate this inverse. You can force WOMBAT to calculate standard errors by supplying the option FORCE-SE in a SPECIAL block (see subsection 4.17.4). Note though that is may require additional calculations to invert the coefficient matrix which can be demanding for large analyses.

7.1.6 File SumSampleAI.out

This file is only written out when specifying run option --sample (see subsection 5.2.9. It gives a brief summary of the characteristics of the analysis and average information matrix for which samples were obtained, together with means and variances across replicates. In addition, large deviations between theoretical results from the information matrix and samples obtained are flagged.

7.2 Additional results

These are large files, most likely subject to further processing by other programs. Thus they contain minimum or no text. They have extension .dat.

7.2.1 File Residuals.dat

This files gives the residuals for all observations, for the model fitted and current estimates of covariance components. It has the same order as the data file, and contains 3 or more space separated columns:

(a)
Column 1 contains the estimated residual.
(b)
Column 2 gives the corresponding predicted observation (y-hat).
(c)
Column 3 gives the corresponding observation.
(d)
Column 4 gives the observation adjusted for fixed effects for analyses with run options --blup, --solvit or --s1step. This is useful in calculating predictive accuracies of breeding values (forward cross-validation).

The first line contains the column names. Note that this file is not produced for runs involving iterative solution of the mixed model equations.

Summary statistics about the distribution of residuals can be readily obtained using standard statistical packages. For example, the following R commands compute means, standard deviations and quartiles, and plot the two columns against each other as well as a distribution histogram for the residuals :

EXAMPLE:

res<-read.table(‘‘Residuals.dat’’) 
summary(res); sd(res) 
par(mfrow=c(1,2)) 
plot(res[,1],res[,3]); hist(res[,1])

7.2.2 File(s) RnSoln_rname.dat

Solutions for each random effect are written to a separate file. These files have names RnSoln_rname.dat, with rname representing the name of the random effect. Columns in these files are :

(a)
Column 1 gives the running number for the level considered
(b)
Column 2 gives the corresponding original code.
(c)
Column 3 gives the ’effect’ number, where, depending on the analysis, ’effect’ is a trait, principal component or random regression coefficient.
(d)
Column 4 gives the solution.
(e)
Column 5 gives the sampling error of the solution, calculated as the square root value of the corresponding diagonal element of the coefficient matrix in the mixed model equations. This is only available, if the last iterate has been carried out using an EM or PX-EM algorithm (or if FORCE-SE has been invoked).

For genetic random effects with covariance option NRM, WOMBAT calculates inbreeding coefficients from the list of pedigrees specified. For such effects, there may be an additional column containing these coefficients (in %). This should be the last column in the RnSoln_rname.dat file.

There may be up to 7 columns – ignore column 6 unless you recognize the numbers (column 6 attempts to give accuracies of estimates, but calculations are not fully debugged; o.k. for simple models but not all cases).

If you have carried out a reduced rank analysis, i.e. give the PC option for the analysis type, the solutions in RnSoln_rname.dat pertain to the principal components! You might then also be interested in the corresponding solutions on the original scale – WOMBAT endeavours to calculate these for you and writes them to the file RnSoln_rname-tr.dat. However, if you have carried out a run in which you have calculated standard errors for the effects fitted, these are ignored in the back-transformation and you will find that column 5 (in RnSoln_rname-tr.dat) consists entirely of zeros – this does not mean that these s.e. are zero, only that they have not been determined.

For multi-trait random regression analyses, regression coefficients are ordered by trait, i.e. coefficients 1, 2, for traits 1 followed by coefficients 1,2, for trait 2, .

7.2.3 File(s) Curve_cvname(_trname).dat

At convergence, curves for fixed covariables fitted are evaluated and written to separate files, one per covariable and trait. These have names Curve_cvname.dat for univariate analyses and Curve_cvname_ trname.dat for multivariate analyses, with cvname the name of the covariable as specified in the parameter file and, correspondingly, trname the name of the trait. Curves are only evaluated at points corresponding to nearest integer values of values found in the data. Each file has four columns :

(a)
Column 1 gives the value of the covariable.
(b)
Column 2 gives the point on the fitted curve.
(c)
Column 3 contains the number of observations with this value of the covariable.
(d)
Column 4 gives the corresponding raw mean.

HINT: To get most information from these files, it might be worth your while scaling your covariables prior to analysis!

7.2.4 File(s) RanRegname.dat

For random regression analyses, WOMBAT evaluates variance components (file RanRegVariances.dat), variance ratios (file
RanRegVarRatios.dat, not written if more than one control variable is used) and selected correlations (RanRegCorrels.dat) for values of the control variable(s) occurring in the data. If approximate sampling variances of parameters are available, it is attempted to approximate the corresponding sampling errors. The general layout of the files is as follows :

(a)
Column 1 gives the running number of the value of the control variable.
(b)
Column 2 gives the corresponding actual value (omitted if more than one control variable).
(c)
The following columns give the variance components, ratios or correlations.
(i)
If sampling errors are available, each source of variation is represented by two columns, i.e. value followed by the approximate lower bound sampling error, with additional spaces between ‘pairs’ of numbers.
(ii)
Random effects are listed in same order as the starting values for random effects covariances are given in the parameter file.
(iii)
If the same control variable is used for all random effects, it is attempted to calculate a total, ‘phenotypic’ variance and corresponding variance ratios and correlations.
(iv)
Correlations are calculated for 5 values of the control variable, corresponding to lowest and highest value, and 3 approximately equidistant intermediate values.

In addition, the files contain some rudimentary headings.

IF the special option RRCORR-ALL has been specified (see subsection 4.17.14), a file RanRegCorrAll.dat is written out in addition. This contains the following columns:

(a)
The name of the random effect
(b)
The running number for trait one
(c)
The running number for trait two
(d)
The running number for the pair of traits
(e)
The value of the control variable (“age”) for trait one
(f)
The value of the control variable for trait two
(g)
The estimated covariance between traits one and two for the specified ages
(h)
The corresponding correlation

7.2.5 Files SimDatan.dat

Simulated records are written to files with the standard name
SimDatan.dat, where n is a three-digit integer value (i.e. 001, 002, ). These files have the same number of columns as specified for the data file in the parameter file (i.e. any trailing variables in the original data file not listed are ignored), with the trait values replaced by simulated records. These variables are followed by the simulated values for individual random effects : The first of these values is the residual error term, the other values are the random effects as sampled (standard uni-/multivariate analyses) or as evaluated using the random regression coefficients sampled - in the same order as the corresponding covariance matrices are specified in the parameter file. Except for the trait number in multivariate analyses (first variable), all variables are written out as real values.

7.2.6 Files EstimSubSetn++m.dat

If an analysis considering a subset of traits is carried out, WOMBAT writes out a file EstimSubsetn+…+m.dat with the estimates of covariance matrices for this analysis. Writing of this file is ‘switched on’ when encountering the syntax “m”->n, specifying the trait number in the parameter file (see subsection 4.10.7). The first two lines of EstimSubsetn++m.dat gives the following information :

(a)
The number of traits in the subset and their names, as given in the parameter file.
(b)
The corresponding trait numbers in the ‘full’ analysis.

This is followed by the covariance matrices estimated. The first matrix given is the matrix of residual covariances, the other covariance matrices are given in the same order as specified in the parameter file.

(c)
The first line for each covariance matrix gives the running number of the random effect, the order of fit and the name of the effect
(d)
The following lines give the elements of covariance matrix, with one line per row.

Finally, EstimSubsetn++m.dat gives some information on the data structure (not used subsequently) :

(e)
The number of records for each trait
(f)
The number of individuals with pairs of records
(g)
The number of levels for the random effects fitted

7.2.7 Files PDMatrix.dat and PDBestPoint

These files give the pooled covariance matrices, obtained running WOMBAT with option --itsum.

PDMatrix.dat is meant to be readily pasted (as starting values) into the parameter file for an analysis considering all traits simultaneously. It contains the following information for each covariance matrix :

(a)
A line with the qualifier VAR, followed by the name of the random effect and the order and rank of the covariance matrix.
(b)
The elements of the upper triangle of the covariance matrix; these are written out as one element per line.

PDBestPoint has the same form as BestPoint (see subsection 7.3.3). It is meant to be copied (or renamed) to BestPoint, so that WOMBAT can be run with option --best to generate a ‘results’ file (BestSoFar) with correlations, variance ratios and eigenvalues of the pooled covariance matrices.

7.2.8 Files PoolEstimates.out and PoolBestPoint

These files provided the results from a run with the option --pool:

1.
PoolEstimates.out summarizes characteristics of the part estimates provided (input), options chosen, and results for all analyses carried out.
2.
PoolBestPoint is the equivalent to BestPoint. If penalized analyses are carried out, copies labelled PoolBestPoint_unpen and PoolBestPoint_txx, with xx equal to the tuning factor, are generated so that files for all sub-analyses are available at the end of the run.

7.2.9 Files MME*.dat

The two files MMECoeffMatrix.dat and MMEEqNos+Solns.dat are written out when the run option --mmeout is specified.

MMECoeffMatrix.dat
contains the non-zero elements in the lower triangle of the coefficient matrix in the MME. There is one line per element, containing 3 space-separated variables:
(a)
The row number (integer); in running order from 1 to N, with N the total number of equations.
(b)
The column number (integer); in running order from 1 to N.
(c)
The element (real).

HINT: This file is in the correct format to be inverted using run option --invert or --invrev.

MMEEqNos+Solns.dat
provides the mapping of equation numbers (1 to N) to effects in the model, as well as the right hand sides and solutions. This file has one line per equation, with the following, space separated variables:
(a)
The equation number (integer).
(b)
The name of the trait, truncated to 12 letters for long names.
(c)
The name of the effect, truncated to 12 letters (For random effects and analyses using the PC option, this is replaced by PCn).
(d)
The original code for this level and effect (integer); for covariables this is replaced by the ‘set number’ (= 1 for non-nested covariables).
(e)
The running number for this level (within effect).
(f)
The right hand side in the MME (real).
(g)
The solution for this effect (real).

7.2.10 File SNPSolutions.dat and SNPCovariances.dat

This is the output file for a run using option --snap. It contains one line for each line found in the input file SNPCounts.dat containing

(a)
The estimated SNP effect (regression coefficient).
(b)
Its standard error (from the inverse of the coefficient matrix).
(c)
The tvalue, i.e. the ratio of the two variables.
(d)
A character variable with a name for the line.

SNPCovariances.dat is generated only when multiple SNPs are considered simultaneously and the option COVOUT is specified (see subsection 4.17.7). This causes the upper triangle of the matrix of sampling covariances to be written to this file (n(n + 1)2 elements for n SNPs, space separated).

7.2.11 Files Pen*(.dat) and ValidateLogLike.dat

The following files are created in conjunction with penalized estimation. Some can be used by WOMBAT in additional steps.

7.2.11.1 File PenEstimates.dat

This file gives a brief summary of estimates together with log likelihood values for all values of the tuning parameter given.

7.2.11.2 File PenBestPoints.dat

This file collects the BestPoint’s for all tuning parameters. The format for each is similar to that for BestPoint (see subsection 7.3.3), except that the first line has 3 entries comprising the tuning factor, the maximum penalized likelihood and the corresponding unpenalized value. It is suitable as input for additional calculations aimed at comparing estimates, and used as input file for ‘validation’ runs (see run option --valid, subsection 5.3.2).

Output to this file is cumulative, i.e. if it exists in the working directory it is not over-written but appended to.

7.2.11.3 File PenCanEigvalues.dat

Similarly, this file collects the values of the tuning factors and corresponding penalized and unpenalized log likelihood values. It has one line for each tuning factor. For a penalty on the canonical eigenvalues, estimates of the latter are written out as well (in descending order). Again, if this file exists in the working directory it is not over-written but appended to.

7.2.11.4 File PenTargetMatrix

If the option PHENV (see subsection 4.17.16) is specified, WOMBAT writes out this file with a suitable target matrix. For penalty COVARM a covariance matrix and for CORREL the corresponding correlation matrix is given. For a multivariate analysis fitting a simple animal model, this is the phenotypic covariance (correlation) matrix. For a random regression analysis, corresponding matrices are based on the sum of the covariance matrices among random regression coefficients due the two random effects fitted, assumed to represent individuals’ genetic and permanent environmental effects (which must be fitted using the same number of basis functions).

Written out is the upper triangle of the matrix.

7.2.11.5 File ValidateLogLike.dat

This file is the output resulting from a run with the option --valid. It contains one line per tuning factor with he following entries:

(a)
A running number
(b)
The tuning factor
(c)
The unpenalized log likelihood in the validation data.
(d)
The penalized log likelihood in the training data.
(e)
The unpenalized log likelihood in the training data.

If this file exists in the working directory it is not over-written but appended to.

7.2.12 File CovSamples_name.dat

This (potentially large) file contains the samples drawn from the multivariate normal distribution of estimates, either for all random effects in the analysis (name = ALL) or for single, selected effect. The file contains one line per replicate, with covariance matrices written in the same sequence as in the corresponding estimation run (for ALL), giving the upper triangle for each matrix.

7.3 ‘Utility’ files

In addition, WOMBAT produces a number of small ‘utility’ files. These serve to monitor progress during estimation, to carry information over to subsequent runs or to facilitate specialised post-estimation calculations by the user.

7.3.1 File ListOfCovs

This file lists the covariance components defined by the model of analysis, together with their running numbers and starting values given. It is written out during the ‘set-up’ phase (see subsection 5.2.3). It can be used to identify the running numbers needed when defining additional functions of covariance components to be evaluated (see section 4.12)

7.3.2 File RepeatedRecordsCounts

This file gives a count of the numbers of repeated records per trait and, if option TSELECT is used, a count of the number of pairs taken at the same time.

7.3.3 File BestPoint

Whenever WOMBAT encounters a set of parameters which improves the likelihood, the currently ‘best’ point is written out to the file BestPoint.

The first line of BestPoint gives the following information :

(a)
The current value of the log likelihood,
(b)
The number of parameters

This is followed by the covariance matrices estimated.

1.
Only the upper triangle, written out row-wise, is given.
2.
Each covariance matrix starts on a new line. ‘
3.
The first matrix given is the matrix of residual covariances. The other covariance matrices are given in the same order as the matrices of starting values were specified in the parameter file.

N.B.: BestPoint is used in any continuation or post-estimation steps – do not delete is until the analysis is complete !

7.3.4 File Iterates

WOMBAT appends a line of summary information to the file Iterates on completion of an iterate of the AI, PX-EM or EM algorithm. This can be used to monitor the progress of an estimation run – useful for long runs in background mode. Each line gives the following information :

(a)
Column 1 gives a two-letter identifying the algorithms (AI, PX, EM) used.
(b)
Column 2 gives the running number of the iterate.
(c)
Column 3 contains the log likelihood value at the end of the iterate.
(d)
Column 4 gives the change in log likelihood from the previous iterate.
(e)
Column 5 shows the norm of the vector of first derivatives of the log likelihood (zero for PX and EM)
(f)
Column 6 gives the norm of the vector of changes in the parameters, divided by the norm of the vector of parameters.
(g)
Column 7 gives the Newton decrement (absolute value) for the iterate (zero for PX and EM).
(h)
Column 8 shows a) the step size factor used for AI steps, b) the average squared deviation of the matrices of additional parameters in the PX-EM algorithm from the identity matrix, c) zero for EM steps.
(i)
Column 9 gives the CPU time for the iterate in seconds
(j)
Column 10 gives the number of likelihood evaluations carried out so far.
(k)
Column 11 gives the factor used to ensure that the average information matrix is ‘safely’ positive definite
(l)
Column 12 identifies the method used to modify the average information matrix (0: no modification, 1: modify eigenvalues directly, 2: add diagonal matrix, 3: modified Cholesky decomposition, 4: partial Cholesky decomposition – see section A.5).

7.3.5 File OperationCounts

This small file gathers accumulates the number of non-zero elements in the Cholesky factor (or inverse) of the mixed model matrix together with the resulting operation count for the factorisation. This can be used to compare the efficacy of different ordering strategies for a particular analysis. The file contains one line per ordering tried, with the following information :

(a)
The name of the ordering strategy (mmd, amd or metis).
(b)
For metis only : the values of the three options which can be set by the user, i.e. the number of graph separators used, the ‘density’ factor, and the option selecting the edge matching strategy (see subsection 5.3.1).
(c)
The number of non-zero elements in the mixed model matrix after factorisation.
(d)
The operation count for the factorisation.

7.3.6 Files AvInfoParms and AvinfoCovs

These files are written out when the AI algorithm is used. After each iterate, they give the average information matrix (not its inverse !) corresponding to the ‘best’ estimates obtained by an AI step, as written out to BestPoint. These can be used to approximate sampling variances and errors of genetic parameters.

N.B.: If the AI iterates are followed by further estimates steps using a different algorithm, the average information matrices given may not pertain to the ‘best’ estimates any longer.

AvInfoParms contains the average information matrix for the parameters estimated. Generally, the parameters are the elements of the leading columns of the Cholesky factors of the covariance matrices estimated. This file is written out for both full and reduced rank estimation.

For full rank estimation, the average information is first calculated with respect to the covariance components and then transformed to the Cholesky scale. Hence, the average information for the covariances is available directly, and is written to the file AvInfoParms.

Both files give the elements of the upper triangle of the symmetric information matrix row-wise. The first line gives the log likelihood value for the estimates to which the matrix pertains – this can be used to ensure corresponding files of estimates and average information are used. Each of the following lines in the file represents one element of the matrix, containing 3 variables :

(a)
row number,
(b)
column number, and
(c)
element of the average information matrix.

N.B.: Written out are the information matrices for all parameters. If some parameters (or covariances) are not estimated (such as zero residual covariances for traits measured on different animals), the corresponding rows and columns may be zero.

7.3.7 Files Covariable.baf

For random regression analyses, file(s) with the basis functions evaluated for the values of the control variable(s) in the data are written out. These can be used, for example, in calculating covariances of predicted random effects at specific points.

The name of a file is equal to the name of the covariable (or ‘control’ variable), as given in the parameter file (model of analysis part), followed by the option describing the form of basis function (POL, LEG, BSP; see subsection 4.10.2) the maximum number of coefficients, and the extension .baf. The file then contains one row for each value of the covariable, giving the covariable, followed by the coefficients of the basis function.

NB. These files pertain to the random regressions fitted! Your model may contain a fixed regression on the same covariable with the same number of specified regression coefficients, n, but with intercept omitted. If so, the coefficients in this file are not appropriate to evaluate the fixed regression curve.

7.3.8 File LogL4Quapprox.dat

For analyses involving an additional parameter, the values used for the parameter and the corresponding maximum log likelihoods are collected in this file. This is meant facilitate estimation of the parameter through a quadratic approximation of the resulting profile likelihood curve via the run option --quapp. Note that this file is appended to at each run.

7.3.9 File SubSetsList

If analyses considering a subset of traits are carried out, WOMBAT writes out files EstimSubsetn + + m.dat (see subsection 7.2.6), to be used as input files in a run with option --itsum or --pool. In addition, for each run performed, this file name this appended to SubSetsList. This file contains one line per ‘partial’ run with two entries: the file name (EstimSubsetn + + m.dat) and a weight given to the corresponding results when combining estimates. The default for the weight is unity.

7.4 Miscellaneous

7.4.1 File ReducedPedFile.dat

As one of the first steps in analyses fitting an animal model, WOMBAT checks the pedigree file supplied against the data file and, if applicable, deletes any individuals not linked to the data in any way. The new, reduced pedigree is written to this file. Like the original pedigree file, it contains 3 columns, i.e. codes for animal, sire and dam.

7.4.2 Files PrunedPedFilen.dat

The second ‘pedigree modification’ carried out is ‘pruning’. This is performed for each genetic effect separately (provided they are assumed to be uncorrelated). The corresponding pedigrees are written to these files, with n = 1,2,… pertaining to the order in which these effects are specified in the model part of the parameter file. Each has 7 columns: columns 1 to give the animal, sire and dam recoded in running order, columns 4 to 6 give the corresponding original identities, and column 7 contains the inbreeding coefficient of the individual.

7.4.3 File WOMBAT.log

This file collects ‘time stamps’ for various stages of a run, together explanatory messages for programmed stops. The content largely duplicates what is written to the screen. However, the file is closed after any message written – in contrast to a capture of the screen output which, under Linux, may be incomplete. It is intended for those running a multitude of analyses and wanting to trace hat went on. N.B. This file is appended too, i.e. will collect information for multiple runs in the current directory if not deleted between runs.