5.2 Basic run options

5.2.1 Continuation run

In some instances, REML estimation continuing from the ‘best’ estimates so far is necessary. This is invoked with the -c option.
If specified, WOMBAT will attempt to read its ‘starting’ values from the file BestPoint in the current working directory. N.B. If this is not available or if a read error occurs, the run proceeds as a ‘new’ run, i.e. using the starting values given in the parameter file instead. The -c option selects estimation using the AI algorithm, unless another procedure is selected explicitly.


Table 5.1: ‘Basic’ run options for WOMBAT
Option

Purpose

-c

Specify a continuation run

-v

Specify verbose screen output

-d

Specify very detailed screen output

-t

Specify terse screen output

Default REML algorithms
--good

Tell that you have good starting values

--bad

Tell that you have bad starting values

Non-estimation runs
--setup

Select a run performing the set-up steps only

--best

Select a run printing out estimates for the currently best point only

--blup

Select a prediction (BLUP) run; direct solution

--solvit

Select a prediction (BLUP) run; solve iteratively

--mmeout

Like --solvit, but write MME out to file.

--simul

Select a run simulating data only

--subset

Write out parameter files for analyses of subsets of traits

--itsum

Iterative summation of partial covariance matrices

--invert

Invert a dense symmetric matrix; pivot on largest diagonal

--invrev

Invert a dense symmetric matrix; reverse pivoting on no. of off-diagonal elements

--inveig

Invert a dense symmetric matrix; use eigen decomposition

--invspa

Invert a sparse symmetric matrix

Miscellaneous
--expiry

Print out the expiry date for the program (to screen)

--limits

Print out the hard-coded program limits (to screen)

--times

Print out various intermediate times for a run

--wide

Write ‘wide’ output files

5.2.2 Level of screen output

The amount of screen output can be regulated by the following options
-t selects terse output
-v selects verbose output, useful to check the parameter file
-d selects detailed output, useful for debugging

5.2.3 Set-up steps

It is good practice, to start each analysis with a run which carries out the set-up steps only, checking that the summary information provided on the model of analysis and data structure in file SumModel.out (see 7.1.2) is as expected, i.e. that WOMBAT is fitting the correct model and has read the data file correctly. This is specified using --setup. This option also invokes verbose screen output.

5.2.4 Quality of starting values

For analyses comprising 18 or less covariance components to be estimated, WOMBAT defaults to the AI algorithm for maximisation. For other analyses, the default is for WOMBAT to begin with up to 3 iterates of the PX-EM algorithm, before switching to an AI algorithm. For reduced rank estimation, every tenth AI iterate is subsequently replaced by a PX-EM step. Two options are provided to modify this according to our perception of the quality of starting values for covariance components available, without the need to consider the ‘advanced’ options below.

If we are confident, that we have good starting values, this might cause an unnecessary computational overhead. Specifying --good reduces the maximum number of (PX-)EM iterates to 1. Similarly, for potentially bad starting values, estimation might converge more reliably if a few more initial (PX-)EM iterates were carried out. Specifying --bad sets this number to 8. If the increase in log likelihood during the initial (PX-)EM iterates is less than 2.0  , WOMBAT will switch to the AI algorithms immediately.

5.2.5 Numerical settings

WOMBAT uses a small positive value as operational zero  to reduce numerical errors. The default value is 10−8  . This can be changed specifying the run option --zeron   where n  is a single-digit integer. This option redefines the operational zero to be 10−n  .

For variance component estimation parameterising to the elements of the Cholesky factors of covariance matrices, estimates are constrained to the parameter space by restricting the pivots in the decomposition to a small value. The default is 0.001  . This can be changed through the run option --pivotx   where is the real value specifying the new limit.

5.2.6 Intermediate results

WOMBAT writes out the currently best values of estimates of covariance components to the file BestPoint whenever the likelihood is increased. The option --best causes WOMBAT to read this file and write out the matrices of covariances and corresponding correlations in more readily legible format to the file BestSoFar.out. WOMBAT with this option can be used in a directory in which an estimation run is currently active, without interfering with any of the files used.

5.2.7 Prediction only

WOMBAT can be used for a simple BLUP run, using the ‘starting’ values given in the parameter files as assumed values for the true covariances. In this mode, no pruning of pedigrees is carried out. If -c is specified in addition, WOMBAT will try to read the values from the file BestPoint instead – this is useful to obtain ‘backsolutions’ after convergence of an estimation run. Solutions can be obtained either directly or iteratively:

5.2.8 Simulation only

Specifying --simul causes WOMBAT to sample values for all random effects fitted for the data and pedigree structure given by the respective files, from a multi-variate normal distribution with a mean of zero and covariance matrix as specified in the parameter file. Again -c can be used to acquire population values from the file BestPoint instead. No fixed effects are simulated, but the overall (raw) mean for each trait found in the data set is added to the respective records. Optionally, --simul can be followed directly (i.e. no spaces) by an integer number n  in the range of 1 to 999. If given, WOMBAT will generate n  simulated data sets (default n = 1  ).

Simulation uses two integer values as seeds to initialise the pseudo-random number generator. These can be specified explicitly in a file RandomSeeds (see 6.5.5.3). Output file(s) have the standard name(s) SimData001.dat, SimData002.dat, ...  , SimDatan   .dat. These have the same layout as the original data file, with the trait values replaced by the simulated records; see 7.2.5.

This option is not available for models involving covariance option GIN (see 4.8.1.2).

5.2.9 Sampling to approximate standard errors

Large sample theory indicates that maximum likelihood estimates are normally distributed with covariance matrix equal to the inverse of the information matrix. Hence, sampling from this distribution has been advocated as a simple strategy to approximate standard errors of ‘complicated’ functions of variance components without the need for a linear approximation or to evaluate derivatives [? ].

WOMBAT provides the run option --samplen   to obtain such samples in a post-analysis step, with n  denoting the number of samples to be drawn. If n  is omitted, the number of samples is set to a value of 10000  . It requires output files from an estimation run (at convergence), namely BestPoint, AvInfoCovs and AvInfoParms to specify the multivariate Normal distribution to sample from. By default, samples for all covariance components in the model are generated. In addition, the covariance matrix for a single random effect only can be selected by specifying its name in a SPECIAL statement (see 4.10.6.7). Samples generated are written to a file with the standard name CovSamples_ALL.dat or CovSamples_name  .dat with name  the name of the random effect selected (see 7.2.12). These files are ‘formatted’ (text) files that are suitable for further analysis using standard statistical software packages. In addition, a file named SumSampleAI.out is written which summarizes details of the run together with means and variances across replicates.

This option is not implemented for random regression analyses or random effects with diagonal covariance matrices. It tends to work best for estimation runs with the PC option, even when matrices are estimated at full rank.

5.2.10 Matrix inversion only

WOMBAT can be used as a ‘stand-alone’ program to invert a real, symmetric matrix. Both a ‘full’ inverse and a ‘sparse’ inverse are accommodated. ‘Full’ inversion of a dense matrix is limited to relatively small matrices - use run time option --limit to find the maximum size allowed in WOMBAT.

The matrix is expected to be supplied in a file, with one record per non-zero element in the upper triangle. Each record has to contain three space separated items: row number, column number and the matrix entry. There are several different ‘modes’:

1.
Using  --invert filename selects ‘standard’ generalised matrix inversion with pivoting on the largest diagonal; if the matrix if not of full rank, rows and columns corresponding to pivots with absolute value less than the operational zero are set to 0.
2.
Option --invrev filename is similar, but inversion is carried  out processing rows and columns in ascending order of the number of off-diagonal elements in the original matrix. The rationale for this is that the computational effort required is proportional to the square of the number of off-diagonal elements. Hence, this option tends to provide faster inversion, but is more succeptible to numerical instabilities than --invert.
3.
Option --inveig filename first performs an eigen-value and -vector decomposition of the matrix. If eigenvalues less than a specified value (default: 0 ) are found, these are set to the minimum, and a modified matrix is written out to a file filename.new before obtaining the generalised inverse from the inverse of the diagonal matrix of eigenvalues (ignoring any 0 entries) and the matrix of eigenvectors. For a matrix not of full rank and the default minimum of 0, this yield a generalised inverse of the same rank as option --invert, but the generalised inverse is different. A value different from 0 can be selected by appending it to the option (no spaces).

EXAMPLE:

wombat -v inveig0.0001 matrix.dat

specifies inversion of the matrix stored in matrix.dat, obtaining a generalised inverse by setting any eigenvalues less than 0.0001  to this value, prior to inversion.

This option should only be used for relatively small matrices.

HINT: Use this feature (with a minimum eigenvalue > 0) to modify an ‘invalid’ matrix of starting values for multivariate analyses.

4.
Option --invlap filename inverts a symmetric, positive definite matrix (only!) by calling the appropriate LAPACK routines.
5.
Option --invspa filename chooses sparse matrix  inversion. This is suitable for the inversion of large, sparse matrices where selected elements of the inverse are sufficient. Calculated and written out are all elements of the inverse corresponding to the non-zero elements in the Cholesky factor of the original matrix; any other elements of the inverse are ignored. The methodology is the same as that used in the Expectation-Maximisation steps in REML estimation. The relevant options to select the ordering strategy etc. (see 5.3.1) are recognised (except for --metisall), but must be given on the command line before --invspa.

The inverse is written out to filename.inv with one row per non-zero element, containing row number, column number and the element of the matrix (space separated).

EXAMPLE:

wombat -v ---amd ---invspa matrix.dat

specifies sparse matrix inversion of the matrix stored in matrix.dat, using approximate minimum degree ordering and requesting ‘verbose’ output. The output file containing the inverse is matrix.dat.inv.

5.2.11 Quadratic approximation

For some types of analyses a parameter ( e.g. the dilution factor for “social” genetic effects) is fixed at a given value and variance components are estimated for this value. To estimate the parameter then requires multiple runs for different values, each generating a point of the profile likelihood for this parameter. Provided an initial triplet of points can be established, comprised of a ‘middle’ parameter value with the highest likelihood bracketed by points with lower likelihoods, a quadratic approximation of the curve can be used to locate its maximum. WOMBAT collects the pairs of parameter values and corresponding likelihoods in a file with the standard name LogL4Quapprox.dat. Specifying the run option --quapp invokes an attempt at a quadratic approximation, utilizing the information from this file. If successful (i.e. if a suitable triplet of points can be found), the estimate of the parameter value at the maximum of the parabola is given, together with its approximate standard error.

5.2.12 Analyses of subsets of traits

For multivariate problems involving more than a few traits, it is often desirable to carry out analyses considering subsets of traits, for example, to obtain good starting values or to trace problems in higher-variate analyses back to particular constellations of traits.

WOMBAT provides the run time option --subsetn  to make such preliminary analyses less tedious. It will cause WOMBAT to read the parameter file for a multivariate analysis involving q  traits and write out the parameter files for all q  uni- (n = 1  ) or q(q− 1)∕2  bi-variate (n = 2  ) analyses possible. For n > 2  , the program will prompt for the running numbers of the n  traits to be considered and write out a single parameter file. These subset analyses are based on the data (and pedigree) file for the ‘full’ multivariate model, using the facility for automatic renumbering of traits and subset selection, i.e. no edits of these parameter files are necessary.

N.B.: This option does not carry through any information from a SPECIAL block, and is not implemented for random regression models or analyses involving correlated random effects or permanent environmental effects fitted as part of the residuals.

On analysis, encountering the syntax for trait renumbering (see 4.8.2) causes WOMBAT to write out an additional file with the estimates for the partial analysis (Standard name EstimSubsetn + ...+ m  .dat with n  to m  the trait numbers in the partial analysis, e.g. EstimSubset2+7.dat; see 7.2.6). In addition, the name of this output file is added to a file called SubSetsList.

HINT: WOMBAT will add a line to SubSetsList on each run – this may cause redundant entries. Inspect & edit if necessary before proceeding to combining estimates !

5.2.13 Pooling estimates of covariance components from part analyses

Combining estimates of covariances from analyses involving different subsets of traits is a regular task. This may be a preliminary step to a higher-dimensional multivariate analysis to obtain ‘good’ starting values. Alternatively, analyses for different subsets of traits may involve different data sets – selected to maximise the amount of information available to estimate specific covariances – and we may simply want to obtain pooled, possibly weighted estimates of the covariance matrices for all traits which utilise results from all partial analyses and are within the parameter space.

WOMBAT provides two procedures to combine estimates of covariance components from different analyses or modify existing matrices. These options are not available for analyses involving random regression models, correlated random effects or permanent environmental effects fitted as part of the residuals.

5.2.13.1 Iterative summing of expanded part matrices

Option --itsum selects a run to combine estimates from partial analyses, using the ’iterative summing of expanded part matrices’ approach of Mäntysaari [14] (see also Koivula et al. [12]), modified to allow for differential weighing of individual analyses. For this run, a file SubSetsList is assumed to exist and list the names of files containing results from analyses of subsets, and, optionally, the weightings to be applied (see 7.3.9). Pooled covariance matrices are written to a file name PDMatrix.dat as well as a file named PDBestPoint (see 7.2.7).

HINT: Use of --itsum is not limited to combining bi-variate analyses or the use of files with standard names (EstimSubsetn +...+ m  .dat), but all input files must have the form as generated by WOMBAT.

To use --itsum to combine estimates from analyses involving different data sets, be sure to a) number the traits in individual analyses appropriately (i.e. 1,...,q  with q  the total number of traits, not the number of traits in a partial analysis), and b) to use the syntax described in 4.8.2 to renumber traits – this will ‘switch on’ the output of subset results files.

Copy PDBestPoint to BestPoint and run WOMBAT with option --best to obtain a listing with values of correlations and variance ratios for the pooled results.

5.2.13.2 Pooling using a (penalized) maximum likelihood approach

Option --pool is similar to --itsum but a) employs a maximum likelihood approach, as described by ? ] , b) facilitates pooling of covariance matrices for all sources of variation simultaneously, and c) allows for penalties to be imposed aimed at ‘improving’ estimates by reducing sampling variation.

Input is as for --itsum and pooled estimates are written to files named PoolEstimates.out (summary file) and PoolBestPoint.

5.2.14 Miscellaneous

Option --expiry will print the expiry date for your copy of WOMBAT to the screen.

Option --limits can be used to  find at the upper limits imposed on analyses feasible in WOMBAT, as ‘hard-coded’ in the program. N.B. Sometimes these are larger than your computing environment (memory available) allows.

Option --times causes WOMBAT to  print out values for the CPU time used in intermediate steps.

Option --wide will generate formatted output  files which are wider than 80 columns.

Run option --help causes a list of available run options to be  printed to screen.


Table 5.2: ‘Advanced’ run options for WOMBAT
Option

Purpose

Ordering strategies for mixed model matrix (MMM)
--metis

Use multi-level nested dissection procedure (MeTiS) to re-order MMM

--mmd

Use multiple minimum degree method to re-order MMM

--amd

Use approximate minimum degree algorithms for re-ordering

--choozhz

Assign initial size of mixed model matrix interactively

Specific REML algorithms
--aireml

Use the AI algorithm

--nostrict

Allow the AI algorithm to take steps decreasing the log likelihood

--modaim

Choose method to modify the AI matrix to ensure it is positive definite

--emalg

Use the standard EM-algorithm

--pxem

Use the PX-EM algorithm

--emai

Use a few rounds of EM followed by AI REML (default)

--pxai

Use a few rounds of PX followed by AI REML

--cycle

Repeat cycles of (PX-)EM and AI iterates

--simplex

Use the Simplex procedure for derivative-free (DF) maximisation

--powell

Use Powell’s method of conjugate directions (DF)

--like1

Carry out a single likelihood evaluation only

Parameterisations
--logdia

Select log transformation of diagonal elements of Cholesky factor

--nologd

No log transformation of diagonal elements of Cholesky factor

--reorder

Pivot on largest diagonal elements of covariance matrices during Cholesky factorisation (default)

--noreord

Carry out Cholesky decomposition of covariance matrices sequentially

--reconstr

Allow change in order of pivots of Cholesky factors of covariance matrices and reconstruction of mixed model matrix

--noreconst

Do not allow changes in order of pivots and reconstruction of mixed model matrix

Miscellaneous
--noprune

Do not prune pedigrees

--batch

Do not pause after warning messages

--quaas

Use alternative algorithm [28] to set up NRM inverse

--redped

Switch on reduction of pedigree file for a sire model.