This chapter describes the files written out by WOMBAT. These comprise ‘internal’ files generated by WOMBAT for use within a run (and thus of limited interest to the user), and various output files. Most files have default names, independent of the analysis.
These are formatted summary files to be read (rather than processed by other programs). They have the extension .out.
If a pedigree file is specified, this file gives some summary statistics on the pedigree information found.
This file gives a summary about the model of analysis specified and the corresponding features of the data found. Statistics given include means, standard deviations and ranges for traits and covariables, and numbers of levels found for the effects in the model of analysis.
It is written after the first processing of the input files, i.e. during the set-up steps.
This files provides estimates of the parameters as estimated, and the resulting covariance matrices and their eigenvalues and, for reduced rank analyses, the corresponding matrices of eigenvectors. WOMBAT also writes out the corresponding matrices of correlations and variance ratios. In addition, values for any user-defined functions of covariances (see section 4.12) are written out.
If the final estimates were obtained using the AI algorithm, WOMBAT provides approximate sampling errors for the parameters and covariance components estimated, as given by the inverse of the respective average information matrices. In addition, sampling errors of variance ratios and correlations are derived, as described in subsection A.4.2.
This is an abbreviated version of SumEstimates.out, written out when the command line option --best is specified. It gives matrices of estimates pertaining to the set of parameters with the highest likelihood found so far.
This file lists the generalised least-squares solutions for all fixed effects fitted, together with ‘raw’ means and numbers of observations for individual subclasses.
HINT: If this file is the ‘by-product’ of an estimation run using the AI-REML algorithm (default), no standard errors for fixed effects are given. The reason is that the AI algorithm does not involve calculation of the inverse of the coefficient matrix of the mixed model equations. Asymptotic lower bound standard errors are written out, however, if the (PX-)EM algorithm is used in the last iterate performed, or if a BLUP run is specified, as both evaluate this inverse. You can force WOMBAT to calculate standard errors by supplying the option FORCE-SE in a SPECIAL block (see subsection 4.17.4). Note though that is may require additional calculations to invert the coefficient matrix which can be demanding for large analyses.
This file is only written out when specifying run option --sample (see subsection 5.2.9. It gives a brief summary of the characteristics of the analysis and average information matrix for which samples were obtained, together with means and variances across replicates. In addition, large deviations between theoretical results from the information matrix and samples obtained are flagged.
These are large files, most likely subject to further processing by other programs. Thus they contain minimum or no text. They have extension .dat.
This files gives the residuals for all observations, for the model fitted and current estimates of covariance components. It has the same order as the data file, and contains 3 or more space separated columns:
The first line contains the column names. Note that this file is not produced for runs involving iterative solution of the mixed model equations.
Summary statistics about the distribution of residuals can be readily obtained using standard statistical packages. For example, the following R commands compute means, standard deviations and quartiles, and plot the two columns against each other as well as a distribution histogram for the residuals :
EXAMPLE:
res<-read.table(‘‘Residuals.dat’’) summary(res); sd(res) par(mfrow=c(1,2)) plot(res[,1],res[,3]); hist(res[,1])
Solutions for each random effect are written to a separate file. These files have names RnSoln_rname.dat, with rname representing the name of the random effect. Columns in these files are :
For genetic random effects with covariance option NRM, WOMBAT calculates inbreeding coefficients from the list of pedigrees specified. For such effects, there may be an additional column containing these coefficients (in %). This should be the last column in the RnSoln_rname.dat file.
There may be up to 7 columns – ignore column 6 unless you recognize the numbers (column 6 attempts to give accuracies of estimates, but calculations are not fully debugged; o.k. for simple models but not all cases).
If you have carried out a reduced rank analysis, i.e. give the PC option for the analysis type, the solutions in RnSoln_rname.dat pertain to the principal components! You might then also be interested in the corresponding solutions on the original scale – WOMBAT endeavours to calculate these for you and writes them to the file RnSoln_rname-tr.dat. However, if you have carried out a run in which you have calculated standard errors for the effects fitted, these are ignored in the back-transformation and you will find that column 5 (in RnSoln_rname-tr.dat) consists entirely of zeros – this does not mean that these s.e. are zero, only that they have not been determined.
For multi-trait random regression analyses, regression coefficients are ordered by trait, i.e. coefficients 1, 2, … for traits 1 followed by coefficients 1,2, … for trait 2, ….
At convergence, curves for fixed covariables fitted are evaluated and written to separate files, one per covariable and trait. These have names Curve_cvname.dat for univariate analyses and Curve_cvname_ trname.dat for multivariate analyses, with cvname the name of the covariable as specified in the parameter file and, correspondingly, trname the name of the trait. Curves are only evaluated at points corresponding to nearest integer values of values found in the data. Each file has four columns :
HINT: To get most information from these files, it might be worth your while scaling your covariables prior to analysis!
For random regression analyses, WOMBAT evaluates variance components (file
RanRegVariances.dat), variance ratios (file
RanRegVarRatios.dat, not written if more than one control variable is used) and
selected correlations (RanRegCorrels.dat) for values of the control variable(s)
occurring in the data. If approximate sampling variances of parameters are available,
it is attempted to approximate the corresponding sampling errors. The general layout
of the files is as follows :
In addition, the files contain some rudimentary headings.
IF the special option RRCORR-ALL has been specified (see subsection 4.17.14), a file RanRegCorrAll.dat is written out in addition. This contains the following columns:
Simulated records are written to files with the standard name
SimDatan.dat, where n is a three-digit integer value (i.e. 001, 002, …). These files
have the same number of columns as specified for the data file in the parameter file
(i.e. any trailing variables in the original data file not listed are ignored), with the
trait values replaced by simulated records. These variables are followed by the
simulated values for individual random effects : The first of these values is
the residual error term, the other values are the random effects as sampled
(standard uni-/multivariate analyses) or as evaluated using the random
regression coefficients sampled - in the same order as the corresponding
covariance matrices are specified in the parameter file. Except for the trait
number in multivariate analyses (first variable), all variables are written out as
real values.
If an analysis considering a subset of traits is carried out, WOMBAT writes out a file EstimSubsetn+…+m.dat with the estimates of covariance matrices for this analysis. Writing of this file is ‘switched on’ when encountering the syntax “m”->n, specifying the trait number in the parameter file (see subsection 4.10.7). The first two lines of EstimSubsetn+…+m.dat gives the following information :
This is followed by the covariance matrices estimated. The first matrix given is the matrix of residual covariances, the other covariance matrices are given in the same order as specified in the parameter file.
Finally, EstimSubsetn+…+m.dat gives some information on the data structure (not used subsequently) :
These files give the pooled covariance matrices, obtained running WOMBAT with option --itsum.
PDMatrix.dat is meant to be readily pasted (as starting values) into the parameter file for an analysis considering all traits simultaneously. It contains the following information for each covariance matrix :
PDBestPoint has the same form as BestPoint (see subsection 7.3.3). It is meant to be copied (or renamed) to BestPoint, so that WOMBAT can be run with option --best to generate a ‘results’ file (BestSoFar) with correlations, variance ratios and eigenvalues of the pooled covariance matrices.
These files provided the results from a run with the option --pool:
The two files MMECoeffMatrix.dat and MMEEqNos+Solns.dat are written out when the run option --mmeout is specified.
HINT: This file is in the correct format to be inverted using run option --invert or --invrev.
This is the output file for a run using option --snap. It contains one line for each line found in the input file SNPCounts.dat containing
SNPCovariances.dat is generated only when multiple SNPs are considered simultaneously and the option COVOUT is specified (see subsection 4.17.7). This causes the upper triangle of the matrix of sampling covariances to be written to this file (n(n + 1)∕2 elements for n SNPs, space separated).
The following files are created in conjunction with penalized estimation. Some can be used by WOMBAT in additional steps.
This file gives a brief summary of estimates together with log likelihood values for all values of the tuning parameter given.
This file collects the BestPoint’s for all tuning parameters. The format for each is similar to that for BestPoint (see subsection 7.3.3), except that the first line has 3 entries comprising the tuning factor, the maximum penalized likelihood and the corresponding unpenalized value. It is suitable as input for additional calculations aimed at comparing estimates, and used as input file for ‘validation’ runs (see run option --valid, subsection 5.3.2).
Output to this file is cumulative, i.e. if it exists in the working directory it is not over-written but appended to.
Similarly, this file collects the values of the tuning factors and corresponding penalized and unpenalized log likelihood values. It has one line for each tuning factor. For a penalty on the canonical eigenvalues, estimates of the latter are written out as well (in descending order). Again, if this file exists in the working directory it is not over-written but appended to.
If the option PHENV (see subsection 4.17.16) is specified, WOMBAT writes out this file with a suitable target matrix. For penalty COVARM a covariance matrix and for CORREL the corresponding correlation matrix is given. For a multivariate analysis fitting a simple animal model, this is the phenotypic covariance (correlation) matrix. For a random regression analysis, corresponding matrices are based on the sum of the covariance matrices among random regression coefficients due the two random effects fitted, assumed to represent individuals’ genetic and permanent environmental effects (which must be fitted using the same number of basis functions).
Written out is the upper triangle of the matrix.
This file is the output resulting from a run with the option --valid. It contains one line per tuning factor with he following entries:
If this file exists in the working directory it is not over-written but appended to.
This (potentially large) file contains the samples drawn from the multivariate normal distribution of estimates, either for all random effects in the analysis (name = ALL) or for single, selected effect. The file contains one line per replicate, with covariance matrices written in the same sequence as in the corresponding estimation run (for ALL), giving the upper triangle for each matrix.
In addition, WOMBAT produces a number of small ‘utility’ files. These serve to monitor progress during estimation, to carry information over to subsequent runs or to facilitate specialised post-estimation calculations by the user.
This file lists the covariance components defined by the model of analysis, together with their running numbers and starting values given. It is written out during the ‘set-up’ phase (see subsection 5.2.3). It can be used to identify the running numbers needed when defining additional functions of covariance components to be evaluated (see section 4.12)
This file gives a count of the numbers of repeated records per trait and, if option TSELECT is used, a count of the number of pairs taken at the same time.
Whenever WOMBAT encounters a set of parameters which improves the likelihood, the currently ‘best’ point is written out to the file BestPoint.
The first line of BestPoint gives the following information :
This is followed by the covariance matrices estimated.
N.B.: BestPoint is used in any continuation or post-estimation steps – do not delete is until the analysis is complete !
WOMBAT appends a line of summary information to the file Iterates on completion of an iterate of the AI, PX-EM or EM algorithm. This can be used to monitor the progress of an estimation run – useful for long runs in background mode. Each line gives the following information :
This small file gathers accumulates the number of non-zero elements in the Cholesky factor (or inverse) of the mixed model matrix together with the resulting operation count for the factorisation. This can be used to compare the efficacy of different ordering strategies for a particular analysis. The file contains one line per ordering tried, with the following information :
These files are written out when the AI algorithm is used. After each iterate, they give the average information matrix (not its inverse !) corresponding to the ‘best’ estimates obtained by an AI step, as written out to BestPoint. These can be used to approximate sampling variances and errors of genetic parameters.
N.B.: If the AI iterates are followed by further estimates steps using a different algorithm, the average information matrices given may not pertain to the ‘best’ estimates any longer.
AvInfoParms contains the average information matrix for the parameters estimated. Generally, the parameters are the elements of the leading columns of the Cholesky factors of the covariance matrices estimated. This file is written out for both full and reduced rank estimation.
For full rank estimation, the average information is first calculated with respect to the covariance components and then transformed to the Cholesky scale. Hence, the average information for the covariances is available directly, and is written to the file AvInfoParms.
Both files give the elements of the upper triangle of the symmetric information matrix row-wise. The first line gives the log likelihood value for the estimates to which the matrix pertains – this can be used to ensure corresponding files of estimates and average information are used. Each of the following lines in the file represents one element of the matrix, containing 3 variables :
N.B.: Written out are the information matrices for all parameters. If some parameters (or covariances) are not estimated (such as zero residual covariances for traits measured on different animals), the corresponding rows and columns may be zero.
For random regression analyses, file(s) with the basis functions evaluated for the values of the control variable(s) in the data are written out. These can be used, for example, in calculating covariances of predicted random effects at specific points.
The name of a file is equal to the name of the covariable (or ‘control’ variable), as given in the parameter file (model of analysis part), followed by the option describing the form of basis function (POL, LEG, BSP; see subsection 4.10.2) the maximum number of coefficients, and the extension .baf. The file then contains one row for each value of the covariable, giving the covariable, followed by the coefficients of the basis function.
NB. These files pertain to the random regressions fitted! Your model may contain a fixed regression on the same covariable with the same number of specified regression coefficients, n, but with intercept omitted. If so, the coefficients in this file are not appropriate to evaluate the fixed regression curve.
For analyses involving an additional parameter, the values used for the parameter and the corresponding maximum log likelihoods are collected in this file. This is meant facilitate estimation of the parameter through a quadratic approximation of the resulting profile likelihood curve via the run option --quapp. Note that this file is appended to at each run.
If analyses considering a subset of traits are carried out, WOMBAT writes out files EstimSubsetn + … + m.dat (see subsection 7.2.6), to be used as input files in a run with option --itsum or --pool. In addition, for each run performed, this file name this appended to SubSetsList. This file contains one line per ‘partial’ run with two entries: the file name (EstimSubsetn + … + m.dat) and a weight given to the corresponding results when combining estimates. The default for the weight is unity.
As one of the first steps in analyses fitting an animal model, WOMBAT checks the pedigree file supplied against the data file and, if applicable, deletes any individuals not linked to the data in any way. The new, reduced pedigree is written to this file. Like the original pedigree file, it contains 3 columns, i.e. codes for animal, sire and dam.
The second ‘pedigree modification’ carried out is ‘pruning’. This is performed for each genetic effect separately (provided they are assumed to be uncorrelated). The corresponding pedigrees are written to these files, with n = 1,2,… pertaining to the order in which these effects are specified in the model part of the parameter file. Each has 7 columns: columns 1 to give the animal, sire and dam recoded in running order, columns 4 to 6 give the corresponding original identities, and column 7 contains the inbreeding coefficient of the individual.
This file collects ‘time stamps’ for various stages of a run, together explanatory messages for programmed stops. The content largely duplicates what is written to the screen. However, the file is closed after any message written – in contrast to a capture of the screen output which, under Linux, may be incomplete. It is intended for those running a multitude of analyses and wanting to trace hat went on. N.B. This file is appended too, i.e. will collect information for multiple runs in the current directory if not deleted between runs.