Chapter 4
The Parameter File



4.1 Overview

All information on the the model of analysis and the data and pedigree files is specified through the parameter file.

The name of the parameter file should have extension ’.par’. By default, WOMBAT expects to find a file wombat.par in the current working directory.
Other filenames can be specified at run time as the last command line option – see chapter 5 for details.

Setting up the parameter file is straightforward, but care and attention to detail are required to get it ‘just right’. The worked examples give ‘templates’ for various types of analyses which are readily adaptable to other problems. Parsing of the lines in the parameter file is fairly elementary, hence it is important to adhere strictly to the format described in the following in painstaking detail. Note also that parsing relies on the parameter file being a ‘plain’ text file and not containing any ‘non-printable’ characters – e.g. do not use TAB characters instead of white spaces.

WOMBAT performs a limited number of consistency checks on the variables and models specified, but these are by no means exhaustive and should not be relied upon!

If WOMBAT does find an obvious error, it will stop with a message like “exit WOMBAT” or, in verbose mode, “Programmed(error) stop for WOMBAT encountered”. Inconsistencies or errors not discovered are likely to wreak havoc in subsequent steps !

HINT: Use the -v option at run time. This will cause WOMBAT to echo each line in the parameter file as it is read (to the screen) – if there is a programmed error stop at this stage, it is easier to find the mistake as you know which is the offending line.

4.2 General rules

The parameter file is read line by line. Each line is assumed to be 88 characters long, i.e. anything in columns 89 onwards is ignored.

Any line with a # in column 1 is considered to be a comment line and is skipped.

WOMBAT relies on specific codes at the beginning of each line (leading blank spaces are ignored) to distinguish between various types of information given. Valid codes are summarised in 4.1. Most codes can be abbreviated to 3 letters. The code and the information following it should be separated by space(s). Codes are not case sensitive, i.e. can be given in either upper or lower case letters.

Depending on the initial code, WOMBAT expects further information on the same or following lines. All variable and file names specified are treated as case sensitive. File names can be up to 30 characters long, and variable names can comprise up to 20 characters. All codes, names and variables must be separated by spaces.

The parameter file can have two different types of entries :

1.
‘Short’ entries (e.g. file names, analysis type, comment) which consist of a single line.
Each of these lines must start with a code for the type of information given.

EXAMPLE:

PEDS pedigrees.dat

Here PEDS is the code for pedigree information, “pedigrees.dat” is the name of the file from which pedigree information is to be read.

2.
‘Long’ or block entries, spanning several lines (e.g. for the layout of the data file, or the model of analysis).
Each of these entries starts with a line which gives the code for the entry (see 4.1), and possibly some additional information. This is followed by lines with specific information, where of each of these lines again starts with a specific code. Except for blocks of starting values of covariance components, the block is terminated by a line beginning with END.

EXAMPLE:

MODEL 
  FIX  CGroup 
  RAN  Animal  NRM 
  TRA  Weight 
END MODEL

This shows a block entry for the model of analysis, where the model fits CGroup as a crossclassified, fixed effect and Animal as random effect with covariance matrix proportional to the numerator relationship matrix, and Weight is the trait to be analysed.

Different entries should be specified in the order in which they are listed in the following sections.

Table 4.1: Valid codes for entries in the parameter file - Part I
Code Within Indicator for
RUNOP Line with run options [4.3 ]
COMMENT  Comment line [4.4 ]
OVERRIDE  Change “hard-coded”program limit [4.5 ]
ANALYSIS Code for analysis type [4.6 ]
PEDS Name of pedigree file [4.7 ]
MRKFILE Name of file with marker counts [4.8 ]
   
DATA Name of data file [4.9 ]
TRNOS DAT Trait numbers (grouped input) [4.9.2 ]
NAMES DAT Multiple column names (grouped input) [4.9.2 ]
   
MODEL Model of analysis [4.10 ]
FIX MOD Name of fixed effect [4.10.2 ]
COV MOD Name of fixed covariable [4.10.3 ]
RAN MOD Name of random effect [4.10.4 ]
RRC MOD Name of control variable [4.10.4.1 ]
SUBJ MOD Name of variable identifying “subject” [4.10.5 ]
EXT MOD Name extra variable [4.10.6 ]
TRAIT MOD Trait name [4.10.7 ]
   
RESIDUAL Residual covariance matrix [4.11.1 ]
ERROR Residual covariance matrix [4.11.1 ]
VARIANCE Covariance matrix: random effects [4.11.2 ]
   
SE+USR Additional, user defined functions of covariances [4.12 ]
SUM SE+ Weighted sum
VRA SE+ Variance ratio
COR SE+ Correlation
   
POOL Options for pooling of covariance components [4.13 ]
PSEUPED POOL Pseudo pedigree structure
SMALL POOL Minimum eigenvalue
ZEROUT Fixed effects levels to be set to zero [4.14 ]
PSEUDOFX Random effects levels to be treated as random [4.15 ]
FIXVAR Fix covariance components [4.16 ]
END

Table 4.2: Valid codes for entries in the parameter file - Part II
Code Within Indicator for
SPECIAL Miscellaneous options [4.17 ]
WEIGHT SPECIAL Weighted analysis [4.17.1 ]
COVZER SPECIAL How to deal with ‘zero’ covariables [4.17.2 ]
CLONES SPECIAL Ignore sorting of data file [4.17.3 ]
FORCE-SE SPECIAL Force calculation of standard errors [4.17.4 ]
BSOLVE-SNP SPECIAL Invoke ‘back-solving’ for SNP effects [4.17.5 ]
REPEAT SPECIAL Run with low proportion of repeated records [4.17.6 ]
RPTCOV SPECIAL Specify residual covariance structure [4.17.6 ]
QTLEFF SPECIAL Specify which covariable is a QTL/SNP effect [4.17.7 ]
SAMPLEAI SPECIAL Sampling to approximate standard errors [4.17.8 ]
SOCIAL SPECIAL Analysis with ’social’ genetic effect [4.17.9 ]
GENGROUPS SPECIAL Fit ‘explicit’ genetic groups [4.17.10 ]
INCORE SPECIAL Analysis storing info in core [4.17.11 ]
AOM-RES SPECIAL Calculate outlier statistics for residuals [4.17.12 ]
KERNEL SPECIAL Gaussian kernel [4.17.13 ]
RRCORR-ALL SPECIAL Alll correlations for RR analyses [4.17.14 ]
HINVERSE SPECIAL Genomic and joint relationship matrices [4.17.15 ]
PENALTY SPECIAL, POOL Invoke penalized estimation [4.17.16 ]

4.3 Run options

While run time options (see chapter 5) are primarily intended to be read from the command line, WOMBAT also provides the facility to specify such options on the first line of the parameter file. To be recognised, this line must start with the code RUNOP and, after space(s), all options are expected to be given on the same line (space separated), in the same form as on the command line.

4.4 Comment line

WOMBAT allows for a single, optional comment line (up to 74 characters) to be specified. This is specified by the code COMMENT (can be abbreviated to COM) at the beginning of the line. Anything following the spaces after COM(MENT) is treated as comment on the analysis. It is printed in some of the output files generated by WOMBAT to assist the user, and has no other use.

4.5 Overriding default program limits

WOMBAT assigns most workspaces used dynamically. However, there are a number of dimensions which require an initial value and these have been assigned default values. Their settings can be inspected by running WOMBAT with run option --limits (see subsection 5.2.15). Occasionally, these defaults are too small or too large for the intended analysis or the hardware available. Hence an override mechanism is provided through a line in the parameter file consisting of the code OVERRIDE (can be abbreviated to OVER) at the beginning of the line followed (space separated) by the name of the variable to be changed and its new value. There should be one such line for each variable to be changed. Variable names recognised are:

MAXCOLS
for the maximum number of columns in the data file,
MAXCOV
for the maximum number of covariables to be fitted,
MAXFIX
for the maximum number fixed effects to be fitted,
MAXRAND
for the maximum number of random effects to be fitted,
MAXEXTRA
for the maximum number of “extra” effects to be fitted,
MAXMETA
for the maximum number of control variables in random regression analyses,
MAXTRAIT
for the maximum number of traits,
MAXCOMBI
for the maximum number of combinations of traits and records per individual,
MAXANIS
for the maximum number levels of the genetic effect (“animals”) corresponding to the pedigree or *.gin file supplied, and
MAXEQNS
for the maximum number of equations in the model of analysis

These names can be abbreviated to the first six letters. Values for MAXANIS and MAXEQNS apply to REML analyses or BLUP runs with direct solution for the mixed model equations – the limits for BLUP analyses via iterative solutions are obtained by multiplying the respective values with a factor of 20.

N.B.: Remember to put any such lines close to the top of the parameter file – so that they are read prior to parts specifying the data layout or model of analysis.

4.6 Analysis type

This is a single line entry beginning with code ANALYSIS (can be abbreviated to ANA), followed by a two- or three-letter code describing the type of analysis. The following codes are recognised :

UNI
for a univariate analysis,
MUV
for a ‘standard’ multivariate analysis,
RR
for a single-trait random regression analysis, and
MRR
for a multi-trait random regression analysis.

Except for UNI, this can be followed (after space(s)) by the code PC to select an analysis which fits the leading principal components only for some of the random effects fitted and yields reduced rank estimates of the corresponding covariance matrices. For MUV and MRR the number of traits in the analysis must be given as well – this should be the last entry of the line.

EXAMPLE:

ANALYSIS MUV PC 8

specifies a reduced rank, multivariate analysis for 8 traits

4.7 Pedigree file

This is a single line entry beginning with code PEDS (can be abbreviated to PED), followed by the name of the pedigree file. There is no default name. This entry is ‘optional’, in such that it is only required if a code of NRM if specified for the covariance structure of a random effect (see subsection 4.10.4). Only one pedigree file can be given. The format of the pedigree file required by WOMBAT is described in detail in section 6.3.

By default, WOMBAT fits an animal model. If a sire model is to be fitted, the code SIREMODEL (can be abbreviated to SIR) should be given after the filename.

Additional options, given after the filename and, if applicable the sire model option, that are recognised are +INBR or +SEX. These instruct WOMBAT to read animals’ inbreeding coefficients or sex codes (1=X/2=XX) from the fourth of fifth column in the pedigree file. If given, +INBR causes WOMBAT to skip calculation of inbreeding coefficients and use the values supplied instead. Option +SEX is required if an NRM for X-linked genetic effects is to be set up.

4.8 Marker counts file

This is a single line entry beginning with code MRKFILE (can be abbreviated to MRK), followed by the name of the file. There is no default name. This entry is ‘optional’, i.e. it is only required for selected analyses where the default name MarkerCounts.dat is to be replaced. The layout of the marker file required by WOMBAT is described in detail in section 6.4.

4.9 Data file

This is a block entry. The block begins with a line containing the code DATA (can be abbreviated to DAT) followed by the name of the data file. There is no default name. The general form of the data file required in described in section 6.2.

The following lines specify the record layout of the data file for all traits. There are two alternative ways of specification.

4.9.1 Simple

For each trait in turn, there should be one parameter file line for each column, up to the last column used in the analysis.
The lines can have up to 3 elements :

(a)
The code TRn where n is a one- or two-digit trait number. This can be omitted for univariate analyses.
(b)
The name of the variable in this column.
(c)
The maximum number of levels. This is required if the column represents a fixed or random effect in the model of analysis or a control variable in a random regression analysis.
Exception: For random effects with covariance option NRM (i.e. additive genetic effects) the number of levels can be given as 0 as WOMBAT counts the number of animals in the pedigree and replaces this number.

The block is terminated by a line with the code END.

EXAMPLE:

DATA  mydata.dat 
  TR1  traitno 2 
  TR1  animal  1000 
  TR1  fixeffect 50 
  TR1  weight 
  TR2  traitno 2 
  TR2  animal 500 
  TR2  fixeffect 30 
  TR2  feedintake 
END DATA

This shows the block for an analysis reading records for 2 traits from the file mydata.dat.

4.9.2 Compact

If there are several traits for which the record layout is the same, the respective record layout can be given for the whole group of traits. This avoids tedious duplication of lines.

This alternative is selected by placing the code GRP after the name of the data file (same line, separated by a space).
For each group of traits, the following lines need to be given :

1.
A ‘header’ line beginning with the code TRNOS (can be abbreviate to TRN), followed by the running numbers of the traits in the group on the same line.
2.
One line for each column in the data file (up to the last column used) which is the same for all traits, containing
(a)
the variable name
(b)
the maximum number of levels, if the column represents a fixed or random effect in the model of analysis (again, this number can be zero for additive genetic effects; see [4.9.1 ]).
3.
One line for each column which has a different name for different traits (e.g. representing the traits to be analysed), containing
(a)
the code NAMES (can be abbreviated to NAM)
(b)
the variable names (on the same line, space separated; the same number of variables as traits in the group must be given)

Again, the block is terminated by a line with the code END.

EXAMPLE:

DATA  mydata.dat  GRP 
   TRNOS  1 2 
   traitno     2 
   animal   1000 
   fixeffect  50 
   NAMES   weight  feedintake 
END DATA

This shows the ‘grouped’ alternative for specifying the data file layout, for two traits with the same column structure in the example above.

4.10 Model of analysis

This is another block entry. The block begins with a line containing the code MODEL (can be abbreviated to MOD), and finishes with a line beginning with END. The block then should contain one line for each effect to be fitted and one line for each trait in the analysis.

4.10.1 Effects fitted

Each of the ‘effect’ lines comprises the following

(a)
a three-letter code for the type of effect,
(b)
the effect name, where the effect name is a combination of the variable name for a column in the data file and, if appropriate, some additional information.
No abbreviations for variable names are permitted, i.e. there must be an exact match with the names specified in the DATA block.
(c)
If the effect is to be fitted for a subset of traits only, the running numbers of these traits must be given (space separated).

4.10.2 Fixed effects

Fixed effects can be cross-classified or nested fixed effects or covariables. The following codes are recognised :

FIX

This specifies a fixed effect in the model of analysis.

NB The name for a fixed effect should not contain a “(”, otherwise it is assumed that this effect is a covariable.

A simple, one-way interaction of two variables can be specified as vn1*vn2, with vn1 and vn2 valid variables names. [Not yet implemented !] 

HINT: ‘Not implemented’ here means merely that WOMBAT will not code the interaction for you – you can, of course, still fit a model with an interaction effect, but you a) have to insert an additional column in the data file with the code for the appropriate subclass, b) fit this as if it were a crossclassified fixed effect, and c) specify any additional dependencies arising explicitly (using ZEROUT, see below).

4.10.3 Fixed covariables

COV

This specifies a fixed covariable. The effect name should have the form “vn(n,BAF)”, where vn is a variable name, the integer variable n gives the number of regression coefficients fitted and BAF stands for a three-letter code describing the basis functions to be used in the regression. NB: By default, intercepts for fixed covariables are not fitted.

Valid codes for basis functions are

POL
for ordinary polynomials.
This is the default and can be omitted, i.e “vn(n)” is equivalent to “vn(n,POL)”. For instance, n = 2 denotes a quadratic regression. Note that WOMBAT deviates both records and covariables from their respective means prior to analysis.

N.B.: This yields a regression equation of the form

y − ¯y = b1(x − ¯x)+b2(x− ¯x)2+ ⋅⋅⋅

rather than an equation of form

               2
y = β0+ β1x+ β2x + ⋅⋅⋅

This should be born in mind when interpreting any solutions for regression coefficients for POL covariables from WOMBAT - while there is a straightforward relationship between coefficients βi and bi, they are not interchangeable.

LEG
for Legendre polynomials.

For example, n = 3 denotes a cubic polynomial, i.e. comprises a linear, quadratic and cubic regression coefficient, but no intercept (This differs from the implementation for random regressions where n = 4 denotes a cubic polynomial).

BSP
for B-spline functions

For analyses fitting spline functions, the degree of the spline is selected by specifying “L”, “Q” or “C” for linear, quadratic and cubic, respectively, immediately (no space) after the code BSP. Note that the default spline function is an equi-distant B-spline (i.e. the range of values of the covariable is divided into intervals of equal size), with the number of knots and intervals determined from the number of regression coefficients and the degree of the spline (k = n d + 1 where k is the number of knots and d is the degree, d = 1 for “L”, d = 2 for “Q” and d = 3 for “C”) [32, section 2.2]. Other spline functions are readily fitted as user-defined basis functions.

USR
for user defined functions

Fitting of an intercept (in addition to deviation from means) can be enforced by preceding n with a minus sign – this is not recommended unless there are no other fixed effects in the model.

A covariable to be fitted as nested within a fixed effect is specified as “vn1*vn2(n,BAF)”, with vn1 the name of the fixed effect. If vn1 is not fitted as a fixed effect, it must be specified an an ‘extra’ effect (see below).

WOMBAT is fussy when it encounters a covariable which has a value of zero: Covariables which can take such value are only permitted if a SPECIAL option is given which confirms that these are valid codes and not ‘missing’ values; see subsection 4.17.2.

4.10.4 Random effects

Random effects include the ‘control variables’, i.e. random covariables for random regression analyses. The following codes are recognised:

RAN

This code specifies a random effect. It should be followed (space separated) by the name of the random effect. After the name, a three-letter code describing the covariance structure of the effect can be given.
Valid codes for covariance structures are :

NRM
which denotes that the random effect is distributed proportionally to the numerator relationship matrix.
If this code is given, a pedigree file must be supplied.
SEX
which denotes that the random effect is distributed proportionally to the numerator relationship matrix for X-linked genetic effects.  This inverse of this matrix is set up directly from the pedigree information, as described by Fernando and Grossman [12].
If this code is given, it is assumed that an autosomal genetic effects (option NRM with corresponding pedigree file) is also fitted and has already been specified. In addition, the pedigree file is expected to have an additional column specifying the number of X-chromosomes for each individual.
IDE
which denotes that different levels of the random effect are uncorrelated. This is the default and can be omitted.
GIN
which denotes that the random effect is distributed proportionally to an ‘arbitrary’ (relationship or correlation) matrix.
The user must supply the inverse of this matrix in the form outlined in chapter 6.
PEQ
which denotes a permanent environmental effect of the animal for data involving ‘repeated’ records, which is not to be fitted explicitly. Instead, an equivalent model is used, which accounts for the respective covariances as part of the residual covariance matrix. This is useful for the joint analysis of traits with single and repeated records.

N.B.: Do not use this option for other effects - WOMBAT has no mechanism for checking that this option is appropriate.

For ’standard’ uni- and multivariate analyses, the random effect name is simply the variable name as given for data file.

4.10.4.1 Random regressions

For random regression analyses, the variable name is augmented by information about the number of random regression coefficients for this effect and the basis functions used. It becomes “vn(n,BAF)”, analogous to the specification for covariables above. As above, n specifies the number of regression coefficients to be fitted. In contrast to fixed covariables, however, an intercept is always fitted. This implies that n gives the order, not the degree of fit. For instance, n = 3 in conjunction with a polynomial basis function specifies a quadratic polynomial with the 3 coefficients corresponding to the intercept, a linear and a quadratic term. WOMBAT allows for different control variables to be used to fit random regressions for different effect. If the model has more than one RRC statement (see below), the specification of random effects needs to be extended to tell WOMBAT which control variable to use for which effect, i.e. “vn(n,BAF,rrcn)” with rrcn the name of the control variable.

Valid codes for BAF are:

POL
for ordinary polynomials.
This is the default and can be omitted, i.e “vn(n)” is equivalent to “vn(n,POL)”.
LEG
for Legendre polynomials.
BSP
for B-spline functions
USR
for user defined functions
IDE
for an identity matrix, i.e. the ith basis function has a single coefficient of “1” for the ith coefficient with all other elements zero. This requires the number of RR coefficients to be equal to the number of levels of the control variable.
It is useful when fitting a multi-trait model as a RR model, e.g. to allow for heterogeneous residual variances.
ONE
to assign a value of unity (“1”) to all coefficients.
This option has been introduced to facilitate a standard multivariate analysis with repeated records by fitting a random regression model to model an arbitrary pattern of temporary environmental covariances correctly.

RRC

This code specifies a ‘control variable’ in a random regression analysis. It should be followed (space separated) by the name of the variable, as given in the DATA statement.
Optionally, immediately after the name (no spaces), the range of values of the variable to be considered can be specified as (m n), with m the lower and n the upper limit.

N.B.: WOMBAT expects the value of the control variable to be non-negative (i.e. 0 or greater) and not to be a fraction (the control variable is read as a real variable but converted to the nearest integer, e.g. values of 3,.0, 3.1 and 3.4 would be treated as 3) – scale your values appropriately if necessary!

N.B.: For multivariate analyses, WOMBAT collects the range of values for the control variable (i.e. minimum and maximum) across all traits. This may be undesirable if different traits have distinct ranges and Legendre polynomials are used as basis function - if so, use the USR option and set up your own file which maps the range for each trait exactly as you want it!

4.10.5 ‘Subject’ identifier

For animal model analyses, WOMBAT assumes that the code for the first genetic effect fitted also identifies the subject on which measurements are taken. For some analyses (in particular those not fitting any additive genetic effects!) and sire models this is not appropriate, and such identifier needs to be supplied as an extra column in the data file.

SUBJ

This code needs to be followed by the name of the variable which identifies the individual.

4.10.6 ‘Extra’ effects

For some models, coding is required for effects which are not explicitly fitted in the model of analysis, for instance, when fitting nested effects. These need to be specified as ‘extra’ effects.

EXT

This code, followed by the respective variable name, denotes an effect which is not fitted in the model but which is required to define other effects.

4.10.7 Traits analysed

One line should be given for each trait. It should contain the following information :

(a)
The code TRAIT (can be abbreviated to TR).
(b)
The name of the trait, as specified in the DATA block.
(c)
The running number of the trait.
(d)
Optional : A numeric value (integer) representing a ‘missing’ value - any records with the trait equal to this value are ignored in the analysis (default is 123456789).1

4.11 Covariance components

Next, the parameter file needs to specify values for all (co)variance components in the model of analysis. For variance component estimation, these are the starting values used. For simple BLUP analyses and simulation runs, these are the values assumed to be the population values.

The input matrices for full rank analyses must be positive definite, i.e. cannot have eigenvalues less than the operational zero. For reduced rank (PC) analyses, some ‘zero’ but no negative eigenvalues are acceptable, provided the rank (i.e. the number of non-zero eigenvalues) is equal to (or greater than) the number of principal components to be fitted.

4.11.1 Residual covariances

4.11.1.1 ‘Standard’ multivariate analyses

The residual covariance matrix is specified by

1.
A line beginning with the code RESIDUAL (can be abbreviated to RES) or ERROR (can be abbreviated to ERR). This is followed by the dimension of the covariance matrix, q (integer number, space separated). If an analysis type PC has been specified, a second number specifying the rank of the estimated covariance matrix, needs to be given (even if this is equal to q).

Optionally, this can be followed (space separated) by a qualifier:

DIAG
specifies that the residual covariance matrix is diagonal
NOSTART
(can be abbreviated to NOS) specifies that no starting values are given; this is only valid in conjunction with the run option --pool !
2.
Without qualifier: The q(q + 1)2 elements of the upper triangle of the residual covariance matrix, given row-wise (i.e. σ12, σ12, , σ1q, σ22, σ23, , σq2).
These can be given several elements per line (space separated, taking care not to exceed the total line length of 78 characters), or one per line, or a mixture – WOMBAT will attempt to read until it has acquired all q(q + 1)2 elements required.

With qualifier: If DIAG is specified only the q diagonal elements (σ12, …, σq2) are read, and if NOSTART is given, no values are read.

4.11.1.2 Random regression analyses

Again, the residual covariance matrix is specified by

1.
A line beginning with the code RESIDUAL (can be abbreviated to RES) or ERROR (can be abbreviated to ERR). This should be followed by the dimension (q) of each residual covariance matrix (integer number), usually equal to the number of traits, and a code indicating what kind of error covariance function is to be fitted. The following codes are recognised :

HOM

This code specifies homogeneous error covariances for all values of the control variable.

HET

This code specifies heterogeneous error covariances, with the covariance function a step function of the control variable. It should be followed (space separated) by an integer number giving the number of steps.

2.
If the model involves multiple control variables, the name of the variable to be used to determine the temporary environmental covariance structure needs to be given at the end of the line.
3.
One or more lines with the residual covariance matrices, consisting of the q(q + 1)2 elements of the upper triangle given row-wise, and, if applicable, additional information.

4.11.2 Covariances for random effects

Similarly, for each covariance matrix due to random effects to be estimated, a ‘header’ line and the elements of the covariance matrix need to be specified.

1.
A line beginning with the code VARIANCE (can be abbreviated to VAR), followed by the name of the random effect and the dimension of the covariance matrix, q (integer number, space separated). If an analysis type PC has been specified, a second number specifying the rank of the estimated covariance matrix, needs to be given (even if this is equal to q).
The name can simply the random effects name as specified for the model of analysis. Alternatively, it can be of the form “vn1+vn2” where vn1 and vn2 are names of random effects specified in the MODEL block. This denotes that the two random effects are assumed to be correlated and that their joint covariance matrix is to be estimated1 1Currently implemented for full-rank, ‘standard’ analyses only ! .

Again, a qualifier DIAG or NOSTART, as described above for the residual covariance matrix (see subsubsection 4.11.1.1) is recognized.

2.
The q(q + 1)2 elements of the upper triangle of the covariance matrix, given row-wise (i.e. σ12, σ12, , σ1q, σ22, σ23, , σq2).
Again, these can be given several elements per line (space separated, taking care not to exceed the total line length of 78 characters), or one per line, or a mixture – WOMBAT will attempt to read until it has acquired all elements required.

As above, if DIAG is specified, only the q variances are read and with NOSTART no values are acquired.

4.12 Additional functions of covariance components

In addition, users can define other functions of covariance components which should be calculated and for which sampling errors should be approximated. This is done in a block entry, beginning with a line containing the code SE+USR (can be abbreviated to SE+), and ending with a line beginning with END. The block should then contain one line for each function to be calculated. The content of the line depends on the type of function. Three types are recognised.

1.
Linear combinations (weighted sums) of the covariance components in the model of analysis. For these, space separated entries o should be
(a)
The code SUM at the beginning of the line.
(b)
A name to identify the sum to be calculated.
(c)
An entry of the form n(w) for each component of the weighted sum, with n the running number of the covariance component and w the weight it is to be multiplied with. If w is unity, it can be omitted, i.e. the entry n is interpreted as n(1).
2.
Ratios of two covariance components. For these, the line should have three entries
(a)
The code VRA at the beginning of the line.
(b)
A name for the variance ratio to be calculated.
(c)
The running number of the covariance component in the numerator.
(d)
The running number of the covariance component in the denominator.
3.
Correlations, i.e. the ratio of a covariance component and the square root of the product of two other covariances. Line entries for these are
(a)
The code COR at the beginning of the line.
(b)
A name to identify the correlation to be calculated.
(c)
The running number of the covariance component in the numerator.
(d)
The running number of the first covariance component in the denominator.
(e)
The running number of the second covariance component in the denominator.

EXAMPLE:

SE+USR 
  SUM  siga+pe  1  2 
  VRA  repeat   5  4 
END

Consider a univariate analysis with repeated records per individual. Fitting additive genetic and permanent environmental effects of the animal, variance components in the model are σA2, σ PE2 and σ E2 (residual), with running numbers 1, 2 and 3, respectively. WOMBAT automatically calculates the phenotypic variance σP2= σA2 + σ PE2 + σ E2 and gives it running number 4. To calculate the repeatability and approximate its sampling error, we first need to define the sum of σA2 and σ PE2 as a new covariance (which receives running number 5), and then define the repeatability as the ratio of this component and σP2.

HINT: Run WOMBAT with the --setup option to begin with. Inspect the file ListOfCovs generated in this step – this gives a list of running numbers for individual covariance components.

4.13 Pooling estimates of covariance components

Information related to pooling of covariance components using a (penalized) maximum likelihood approach is specified in a block entry, beginning with a line containing the code POOL and ending with a line beginning with END. Within the block, the following directives are recognized:

PSEUPED

followed (space separated) by a three letter code specifying the assumed pedigree structure and values on sizes or numbers of families. The following pseudo-pedigree codes are available:

PHS
denotes a simple balanced paternal half-sib design. Optionally, two integer numbers giving the numbers of sires and the number of progeny per sire, respectively, can be given (space separated, on the same line) after the code. If not given, default values of 10 and 4 are used.

This option is suitable for a simple animal model only. WOMBAT will check that a) there is only one random effect fitted, and b) that this has covariance option NRM. If the MINPAR option (see below) is used, WOMBAT cannot perform these checks; hence the DIRADD code together with the name of the random effect, as described below, need to be given.

HFS
implies a balanced hierarchical full-sib design comprised of s sires, d dams per sire and n progeny per dam. Assumed values for s, d and n can be given (space-separated) on the same line. If omitted, default values of s = 10, d = 5 and n = 4 are used. This option is suitable for a simple animal model or a model fitting maternal permanent environmental effects in addition (i.e. accommodates a maximum of 2 random effects). Again, if the MINPAR option is used, codes DIRADD and MATPE need to be given in addition.
BON
selects a design comprising 8 individual per family in two generations, due to Bondari et al. [5]. Optionally, this code can be followed (space separated) by the number of such families (integer); if not given, a default value of 2 is used. For this design, expectations of covariances between relatives due to direct and maternal effects (genetic and permanent environmental) are available (i.e. a maximum of three effects). In order for WOMBAT to ‘know’ which random effect has which structure, additional information is required. This should comprise one additional line per random effect fitted, with each line consisting of a keyword specifying the type of random effect followed (space separated) by the name of the effect as specified in the model of analysis. Keywords recognized are DIRADD for direct additive genetic, MATADD for maternal genetic and MATPE for maternal permanent environmental effects.

EXAMPLE:

MODEL 
   RAN     animal  NRM 
   RAN     gmdam   NRM 
   RAN     pedam   IDE 
   trait   bwgt    1 
   trait   wwgt    2 
   trait   ywgt    3 
END MOD 
VAR animal   3  NOS 
VAR gmdam    3  NOS 
VAR pedam    3  NOS 
VAR residual 3  NOS 
POOL 
   PSEUPED    BON  10 
      DIRADD  animal 
      MATADD  gmdam 
      MATPE   pedam 
   SMALL      0.001 
   DELTAL     0.005 
END
USR
Other designs and models are readily accommodated by specifying the expectations of covariances between family members for each random effect fitted (excluding residuals - this is set automatically to be an identity matrix). This is selected by the USR option which must be followed (space separated) by an integer specifying the family size. Optionally, a second number giving the number of families is recognized (a default value of 2 is used if not given). Then the upper triangle of the matrix of coefficients in the expectation of covariances has to be given for each random effect fitted. For multiple random effects, these have to be given in the same order in which they have been specified in the VAR statements in parameter file, and the matrix for each effect has to begin on a new line.

EXAMPLE:

MODEL 
  RAN  animal  NRM 
    ... 
END MOD 
VAR   animal   4  NOSTART 
VAR   residual 4  NOSTART 
POOL 
   PSEUPED  USR  5 
   1.0  0.50  0.50  0.50  0.50 
   1.0  0.25  0.25  0.25 
   1.0  0.25  0.25 
   1.0  0.25 
   1.0 
END

This shows the coefficients for direct additive genetic effects for a family comprising a sire (individual 1) with four progeny from unrelated dams.

SMALL

followed (space separated) by a real number which gives the lower limit (1) for smallest eigenvalue allowed in the combined matrices. If this is not specified, a default value of 0.0001 is used.

DELTAL

followed (space separated) by a real number specifying the convergence criterion: if the increase in log likelihood between iterates falls below this value, convergence is assumed to have been achieved. If not specified a stringent default of 0.00005 is used.

MINPAR

The default assumption for the parameter file used is that it is a ‘full’ parameter file as for a corresponding, multivariate analysis. However, the information on pedigree and data file and their layout are not actually required. Hence the option MINPAR (on a line by itself) is provided which allows use of a ‘minimum’ parameter file, switching off some of the consistency checks otherwise carried out. If used, MINPAR must be the first entry within the POOL block.

The minimum information to be given in the parameter file must comprise:

1.
The ANALysis statement
2.
A VAR line for each covariance matrix, together with the NOSTART option telling WOMBAT not to expect a matrix of starting values.
3.
The POOL block, including statements showing which random effect represents which type of genetic or non-genetic effect.

EXAMPLE:

ANAL MUV 14 
VAR  animal    14  NOSTART 
VAR  residual  14  NOSTART 
POOL 
   MINPAR 
   SMALL  0.001d0 
   PSEUPED  hfs  100  10  4 
     DIRADD  animal 
END

SINGLE

The default form of input for results from part analysis is to read estimates from separate files, in the form of output generated by WOMBAT when carrying out multiple analyses of subsets of traits. Alternatively, all information can be given in a single file. This is selected by the option SINGLE, followed (space separated) by the name of the input file (same line). The layout required for this file is described in subsubsection 6.6.4.2.

PENALTY

followed (space separated) by a code word(s) defining the type and strength of penalty to be applied. Codes for the penalty type recognised are:

CANEIG
selects a penalty on the canonical eigenvalues. This has to be followed (space separated) by either ORG or LOG specifying a penalty on eigenvalues on the original scale or transformed to logarithmic scale (no default).
COVARM
specifies shrinkage of a covariance matrix towards a given target.
CORREL
chooses shrinkage of a correlation matrix towards a given target correlation matrix.

Either COVARM or CORREL can be followed (space separated) by the keyword MAKETAR. If this is given, WOMBAT determines the shrinkage target as the phenotypic covariance (or correlation) matrix obtained by summing estimates of covariances for all sources of variation from the preceding, unpenalized analysis. If this is not given, the upper triangle of the target matrix is expected to be read from a file with the standard name PenTargetMatrix; see subsubsection 6.6.7.1.

The last entry on the line relates to the tuning factor(s) to be used: If a single penalized analysis is to be carried out, the corresponding tuning factor should be given (real value). To specify multiple penalized analyses, specify the number of separate tuning factors multiplied by 1 as an integer value (e.g. -3 means 3 analyses), and list the corresponding tuning factors space separated on the next line.

EXAMPLE:

1.
Shrink all correlation matrices towards the phenotypic correlation matrix using a single tuning factor of 0.1, and calculate the shrinkage target from unpenalized results.
PENALTY CORREL  MAKETAR  0.1
2.
Shrink canonical eigenvalues on the logarithm scale towards their mean, using 5 different tuning factors
PENALTY CANEIG LOG  -5 
0.01  0.1  0.5  1.0  2.0

4.14 Dependencies among fixed effects

WOMBAT requires a coefficient matrix in the mixed model equations which is of full rank. Hence, constraints must be placed on the fixed effects part of the model if more than one fixed effect is fitted. By default, the first level of each cross-classified fixed effect other than the first is ‘zeroed out’ for each trait to account for dependencies. If there are additional dependencies, these should be identified and specified explicitly prior to each analysis.

WOMBAT performs a simple least-squares analysis on the fixed effects part of the model, attempting to find such additional dependencies. However, this procedure should not be relied upon, in particular for large data sets where numerical errors tend to accumulate sufficiently to obscure identification. Dependencies not properly taken into account can lead to problems during estimation !

Additional effects to be zeroed out are specified in a block entry. The block begins with a line containing the code ZEROUT (can be abbreviated to ZER), and finishes with a line beginning with END. The block then should contain one line for each additional dependency. Each line should contain three entries :

(a)
The name of the fixed effect, as specified in the MODEL block.
(b)
The ‘original’ code for the level to be zeroed out, as encountered in the data file.
(c)
The trait number; this can be omitted for univariate analyses.

4.15 Random effects to be treated as fixed

In some instances, it is desirable to treat selected levels of a random, genetic effect as if it were ‘fixed’. A typical example is the analysis of dairy data under a sire model. If the data includes highly selected proven bulls which only have records from their second crop of daughters, we might want to treat these bulls as ‘fixed’ as we would expect estimates of the genetic variance to be biased downwards otherwise.

Treating selected random genetic effects levels as fixed is achieved by replacing the diagonal in the numerator relationship matrix by a value of zero when setting up the inverse of the numerator relationship matrix. If there is any pedigree information for the effect, this is ignored.

N.B.: Treating levels of random effect(s) as fixed may lead to additional dependencies in the fixed effects part of the model, especially for multivariate analyses. Great care must be taken to identify these and specify them explicitly, as outlined above (section 4.14). WOMBAT has no provision to account for this possibility !

Random genetic effects levels to treated as fixed are specified in a block entry. The block begins with a line containing the code PSEUDOFX (can be abbreviated to PSE). This can be followed (space separated) by a real number. If given, this value (which should be a small positive number, e.g. 0.0001 or 0.001) is used instead of the value of 0 when specifying the contributions to the inverse of the numerator relationship matrix. Effectively, this treats the respective effect level as “just a little bit random”. This option is provided as a mean to counteract additional dependencies in the model (see warning above). It is particularly useful for prediction, and should be used very carefully with estimation runs. As usual, the block finishes with a line beginning with END. The block then should contain one line for each effect. Each line should contain two entries :

(a)
The name of the random effect, as specified in the MODEL block.
This should be a genetic effect, distributed proportionally to the numerator relationship matrix among animals.
(b)
The ‘original’ code for the level to be treated as ’fixed’, as given in the pedigree file.

N.B.: This option has only undergone fairly rudimentary testing !
It is not available in conjunction with the PX-EM algorithm.
Selecting this option prohibits re-use of inverse NRM matrices set up in any previous runs.

4.16 Covariance components to be treated as fixed

WOMBAT provides limited facilities to fix certain covariance components at their input values, while maximising the (log) likelihood with respect to the remaining parameters. Again, this information is given in a block entry. The block begins with a line containing the code FIXVAR (no abbreviation) and, as usual, ends with a line containing the code END. Each line within the block is then expected to have the following entries:

(a)
The name of the random effect, as specified in the MODEL block.
(b)
The code ALL to signify that all pertaining covariances are fixed.

In the moment, no options to fix individual elements of covariance matrices are recognised.

4.17 Special information

A place to collect miscellaneous options which do not fit into the categories above is provided through a block entry beginning with a line SPECIAL, and ending with a line END.

Options are available to address the following problems:

4.17.1 Weighted analysis

A weighted analysis is specified by a line with the following space separated entries:

(a)
The code WEIGHT at the beginning of the line.
(b)
The name of the column in the data containing the weight.
(c)
Optionally, for standard uni- and multivariate analyses, this can be followed by a code to select the type of weighting to apply:
none
is the default: each record is multiplied by the weight given prior to the analysis.
DESIGN
causes entries of “1” in the design matrices to be replaced with the weight specified (observations are not scaled).
RESID
multiplies the diagonal elements of the residual covariance matrix pertaining to the observation with the weight given; in addition, any non-zero elements for a pair of records (on the same individual) are scaled with the geometric mean of the corresponding pair of weights (i.e. √-----
 wiwj).

EXAMPLE: Consider an animal with two records and corresponding weights w1 = 0.8 and w2 = 2. Let the unweighted residual covariance matrix be

( 10  −5 )                 (  0.1143 0.0286 )
  −5  20      with inverse     0.0286 0.0571

The weighted matrix is then

(               )                (               )
       8  −6.325     with inverse   0.14286  0.02259
   −6.325      40                   0.02259  0.02857

4.17.2 Covariables that are zero

WOMBAT now checks for covariables which have a value of zero. If you fit a covariable which has such values, it expects you to confirm that this is intentional. This is done through a single line with space separated entries:

(a)
The code COVZER at the beginning of the line.
(b)
The full name of the covariable as given in the model statement (i.e. including brackets, number of coefficients and, if applicable, type).
(c)
A three letter option: FIT to confirm that a value of zero for this covariable is a valid code, or IGN to skip such records (any other option will lead to a programmed error stop).

4.17.3 Animals that are clones

WOMBAT will stop if it encounters non-consecutive records for the same subject in different parts of the data file, assuming that the data has not been sorted as required. However, there are situations where this is a ‘planned’ scenario, for example to fit data on clones, where we want the same genetic effect but different residual effects for members of a clone. Hence, this check can be switched off by including a line with the code CLONES in the SPECIAL block.

4.17.4 Standard errors for fixed and random effect solutions

WOMBAT reports solutions for fixed and random effects fitted – these are calculated as a by-product by all algorithms to obtain REML estimates of variance components. However, standard deviations/errors are only reported for those algorithms which invert the coefficient matrix in the mixed model equations. An option is provided to enforce this – this is invoked by a line (in a SPECIAL block) with the code FORCE-SE. If standard errors have been obtained automatically, this has no effect. Otherwise, it forces calculation of the inverse of the coefficient matrix at convergence.

N.B.: For large analyses, this can take quite some extra time.

4.17.5 GWAS from ssGBLUP: backsolving for marker effects

For analyses incorporating genomic information in the relationship matrix for additive genetic effects, WOMBAT provides the opportunity to ‘back solve’ from predicted breeding values for marker effects and their standard error at the end of a BLUP or REML run, providing a GWAS scan; see Aguilar et al. [1]. This is invoked by a line (in a SPECIAL block) with the code BSOLVE-SNP rname, where rname is the name of the random effect representing additive genetic effects.

Details about additional information and input files needed and the output generated are available with Example 22.

4.17.6 Repeated records

WOMBAT attempts to check the data for records representing repeated measurements for a trait and individual, and stops when this is fairly low. This can be overridden to some extent: Analyses with repeated records for a trait for less than 20% and more than 10% of individuals can be enabled by a single line with the code REPEAT at the beginning of the line.

Standard multivariate analyses (MUV) with repeated records can represent a somewhat difficult scenario, as proper modelling of the residual covariance structure relies on differentiating between records for different traits are taken simultaneously and thus should be assumed to be subject to a common,temporary environmental (residual) covariance, and which should not. There are four strategies which can be chosen by specifying a single line with the following space separated entries:

(a)
The code RPTCOV at the beginning of the line.
(b)
A seven-letter qualifier to select he strategy to be used. Valid terms are:
INDESCR
to fit a residual covariance σE ij between all observations for traits i and j for an individual (this was the previous default).
ALIGNED
to fit a residual covariance σE ij only between obervations with the same running running number, e.g. the first record for trait i and the first record for trait j (this is the current default).
ZEROALL
for the case where all temporary environmental covariances σE ij are assumed to be zero. This should be accompanied by corresponding starting values for the residual covariance components.
TSELECT
when none of the other options are appropriate and where the data file has a column with a ‘time of recording’ indicator (integer). The name of this column must be given on the same line, space-separated after TSELECT. This allows for proper identification of records for different traits taken at the same time – and thus assumed to have a non-zero, temporary environmental covariance – and those which are not when there is an arbitrary pattern of missing records.

EXAMPLE:

SPECIAL 
   RPTCOV TSELECT  mtime 
END

Specification of this option is required for multivariate analyses combining traits with single and repeated records.

4.17.7 Fitting SNP effects

When using the run option --snap, the covariable in the model representing SNP effects needs to be identified by a line in a SPECIAL block containing

(a)
The code SNPEFF at the beginning of the line.
(b)
Space separated, the full name of the covariable, as specified in the MODEL block (i.e. including brackets and the order of it).
(c)
Optional, for analyses estimating more than one regression coefficient: Space separated, the code COVOUT to specify that the upper triangle of the covariance matrix of regression coefficients is written out.

In addition, a COVZER statement may be required to ensure WOMBAT treats count of zero as valid covariable values.

For ‘house-keeping’ reasons, WOMBAT requires that these covariables are specified (in the MODEL block) before any other, ‘regular’ covariables to be fitted.

EXAMPLE:

SPECIAL 
   SNPEFF  snp(1) 
   COVZER  snp(1)  FIT 
END

4.17.8 Sampling to approximate standard errors

The default in drawing samples from distribution of estimates of covariance matrices is to sample the parameters estimated, i.e. the elements of the Cholesky factor of the matrix of interest, and to construct the matrix samples from these. This yields samples within the parameter space. However, it can also yield substantial differences in the mean of samples and their variances from the ‘input’ values. Hence an option is provided to select sampling of the covariance matrices instead. This will yield means close to the REML estimates and estimates of sampling variances closer to those derived from the inverse of the average information matrix, but is likely to yield samples out of the parameter space.

To select a single covariance matrix to be sampled when approximating sampling variances (see subsection 5.2.9) requires a single line in a SPECIAL block with at least two entries:

(a)
The code SAMPLEAI at the beginning of the line.
(b)
The name of the random effect as specified in the MODEL block (space separated).
(c)
Optionally, this can be followed (space separated) by the code COVAR to select sampling of covariance matrices.
(d)
Optionally again, COVAR can be followed (space separated) by the code TRUNC which invokes modification of samples with eigenvalues close to zero (truncating at 104) to ensure the sample has the rank specified.

EXAMPLE:

SPECIAL 
   SAMPLEAI  animal 
END

4.17.9 Fitting “social” genetic effects

To fit a model with an additional random effect representing individuals’ competitive genetic effects, a random regression model analysis has to be selected – fitting random effects at an order of 1 with a basis function of “1” (option ONE) yields an equivalent model to a standard, univariate analysis whilst providing the flexibility to model residual variances as heterogeneous. WOMBAT then requires a “social” group code to be supplied and collects the identities of the individuals in each group as part of its set-up steps.

Additional information required is to be supplied by a single line in the SPECIAL block. This should contain the following, space separated entries:

(a)
The code SOCIAL at the start of the line.
(b)
The name of the random effect representing the “social” genetic effect. This can be abbreviated by omitting the part in brackets specifying the order of fit and basis function (e.g. (1,ONE)).
(c)
The name of the fixed or extra effect representing the “social” group.
(d)
An Integer variable giving the maximum group size.
(e)
Optional: A Real variable giving the dilution factor d to be used. If omitted, a default value of d = 0 (i.e. no dilution) is used.
(f)
Optional: If a non-zero dilution factor has been given, an additional, four-letter code is recognised which specifies the “target group size” for which competitive genetic variance components are to be estimated. Codes available are NTWO which targets a group size of 2 (by using an entry of [1(n1)]d in the corresponding design matrix) and NBAR which targets the average group size (using coefficients [(n1)(n1)]d in the design matrix; see Bijma [4]). If omitted, the default used is NTWO.

For each run where a dilution factor has been specified, WOMBAT appends the value of the dilution factor used together with the corresponding maximum log likelihood to a file in the working directory named LogL4Quapprox.dat (see subsection 7.3.8). These represent points on the profile likelihood curve for the dilution factor, and a quadratic approximation can be used to determine its maximum. The run time option --quapp is provided to assist in this task.

4.17.10 Fitting ‘explicit’ genetic groups

When unknown parent groups (UPG) are to be treated as random, an alternative to including them in the inverse of the relationship matrix among additive genetic effects [see 48] is to fit them explicitly, i.e. to fit them as a separate random effect.

y = Xb + Zu + ZQg + e

with y, b, u, g and e denoting the vectors of observations, fixed effects, random effects, group effects and residuals, respectively, and X and Z the corresponding design matrices. Q is the matrix linking individuals’ additive genetic effects to groups, with each row of Q potentially containing multiple non-zero elements, tracing the proportional contribution of genetic groups through the pedigree, which sum to unity. WOMBAT allows for Q to be supplied by the user or can set up Q in core if group codes for missing sire and dam identities are given.

Instructing WOMBAT to fit genetic groups thus requires these to be specified as a random effect in the model of analysis, and the corresponding (co)variance components statements need to be provided. In addition, an entry in a SPECIAL block at the bottom of the parameter file is needed. This should consist of a single line, with the following, space separated entries:

(a)
The code GENGROUPS at the start of the line.
(b)
The name of the random effect in the model representing genetic group effects.
(c)
An integer variable giving the number of genetic groups to be fitted (NB: This must be the actual number, not just some number at least as big).
(d)
A variable treated either as a scale factor or a special option:
(e)
The name of the random effect in the model representing individuals’ additive genetic effect corresponding to the genetic groups.

For example,

EXAMPLE:

MODEL 
       ... 
# specify animal genetic effects first 
  RAN animal  GIN 
  RAN gggrps  IDE 
       ... 
END 
       ... 
SPECIAL 
# specify which random effect represents groups, 
# no. of levels and scale factor for proportions 
  GENGROUPS gggrps 27 1 animal 
       ... 
END

Depending on the covariance option for additive genetic effects, information on Q or the UPG codes is expected to be read either from the corresponding pedigree file (NRM), starting with column 5 or from the *.codes file (GIN), beginning in column 4.

The elements of the rows of Q are read as a string of g space separated values (with g the number of groups as given in the SPECIAL block) for each individual. These values are assumed to have been obtained by multiplying the proportions with the scale factor specified.

For example, for g = 5 and a scale factor of 1000, the *.codes file might be:

EXAMPLE:

1 2301   1   900     0   100     0     0 
2 2302   1     0   900     0     0   100 
3 2303   2   100     0   800   100     0 
4 2304   2     0     0     0     0  1000 
            ...

with columns 1 and 2 the running number and original animal code, column 3 the type of animal (1 = non genotyped, 2 = genotyped; for compatibility, this code needs to be included for all run options if genetic groups are fitted, including those that do not treat genotyped animals differently). Columns 4 to 8 then give the proportions of group membership, multiplied by the scale of 1000 for each of the 5 groups.

If it is chosen to supply UPG codes instead, these are expected to be given as columns 5 (sire) and 6 (dam) in the pedigree or columns 4 (sire) and 5 (dam) in the *.codes file. NB: For this option the genetic group codes need to be coded from 1 to the number of groups

4.17.11 In core storage

The default for WOMBAT is to repeatedly read data and relationship information from disk so as to minimize memory requirements. In-core storage of selected information can be specified with a line in a SPECIAL block. This should begin with the code INCORE followed (space separated) by one or more qualifier(s) choosing which information is to be stored in core. Valid qualifiers are NRM (inverse of numerator relationship matrix), GIN (*.gin matrix), DATA (recoded data), RAW (raw data), ZRE (*.matrix or *.chlsky required or reduced rank analyses usng AI-REML) and ALL (all five).

EXAMPLE:

SPECIAL 
       ... 
   INCORE GIN DATA 
       ... 
END

4.17.12 Calculation of alternative outlier model statistics

Calculation of alternative outlier model (AOM) statistics for residuals or random effects fitted is invoked by specifying the option AOM-RES or AOM-RND by itself on a line in a SPECIAL block in the parameter file.

EXAMPLE:

SPECIAL 
       ... 
   AOM-RES 
       ... 
END

The AOM options are implemented for uni- and full rank standard multivariate analyses only; they are ignored for reduced rank (PC) analyses and random regression models. Calculations are carried out for one observation or random effects level at a time. For large models or cases with dense coefficient matrices C, this can take quite a long time.

AOM statistics for residual effects, R1e∘ -----−1----−-1---−1---′-−1-
  Dia(R   − R  WC    W  R   ), are reported together with residuals in Residuals.dat. This file has one row per record with up to seven columns. Columns 1 to 3 are written out for all analyses and contain the residual, the predicted observation and the actual observation (for reference). Specifying AOM-RES adds the diagonal element of the projection matrix (pii; column 4), the respective element of R1e (the nominator of the AOM statistic; column 5) and the AOM statistic (column 6). For univariate analyses, the corresponding diagonal element of the “Hat” matrix (σ2[1 σ2pii] with σ2 the residual variance) is given in addition (column 7).

AOM statistics for random effects, G1u∘ -----−-1----−1-ZZ--−1-
  Dia(G   − G   C  G   ), are reported alongside the corresponding solutions in the RnSoln_*.dat files. This file has up to 10 columns. If AOM-RND has been specified, the last three columns give the diagonal element of the covariance matrix (T_ii), the element of G1u ((G{-1}u)_i) and the AOM statistic, respectively. Where the denominator is essentially zero (e.g. for additive genetic effects for parent without records), the AOM statistics is set to zero.

4.17.13 Analyses using a Gaussian Kernel matrix

Kernel based prediction using genomic information is attracting increasing interest. This can be parameterised as

pict

To facilitate estimation of the bandwidth parameter, WOMBAT has been adapted to calculate the inverse of the kernel matrix for a chosen value of 𝜃, with dij the ’distance’ for individuals i and j.

To fit a Gaussian kernel matrix as ‘relationship matrix’, the random effect which this matrix pertains to needs to be specified with the covariance option GIN. The corresponding *.gin file should follow the same format as for other effects with such covariance structure, except that the elements given should be the squared distances, dij2 and that the diagonals of 0 can be omitted. Only rudimentary checks of the input values are performed, e.g. WOMBAT stops if a zero distance between a pair of individuals is found, as this will result in a non-positive definite matrix.

The bandwidth parameter to be used needs to be specified in a SPECIAL block. This should contain a line starting with the code KERNEL followed (space-separated) by the name of the random effect as given in the MODEL block, and the value of 𝜃. E.g. for random effect animal and 𝜃 = 1.5

EXAMPLE:

SPECIAL 
       ... 
   KERNEL animal   1.5 
       ... 
END

WOMBAT will then construct the kernel matrix for the given value of 𝜃 and attempt to invert it, as well as calculate its log determinant. K is assumed to be dense and is automatically stored ‘in-core’. Inversion is carried out using LAPACK [3] routine DPFTRF [19] which requires a positive definite matrix.

On completion of the estimation step for a given value of 𝜃, it is written to the file LogL4Quapprox.dat together with the corresponding maximum log likelihood. To determine the next value on the profile likelihood for 𝜃 to be evaluated, WOMBAT can be run with the command line option --quapp (see subsection 5.2.12) which performs a quadratic approximation of the profile log likelihood for 𝜃.

4.17.14 Correlations for random regression analyses

By default, WOMBAT calculates and writes out a small number of correlations on the ’original scale’ for the estimated covariance functions. These are primarily given to assist with checking your own calculations to obtain the correlations you are interested in - covariances and correlations are easily calculated using the file of evaluated basis functions provided.

The complete correlation matrix is not provided by default as this might result in big output files and many calculations needed. However, if you really want correlations for all values of the control value, you can specify this by a single line in a SPECIAL block containing the code RRCORR-ALL. If the analysis fits a single control variable, this writes out not only RanRegCorrels.dat, but also writes an additional file RanRegCorrAll.dat (see subsection 7.2.4) with the same information in a format more convenient for further processing, e.g. plotting.

4.17.15 Genomic relationship matrices

WOMBAT has a special module for a pre-analysis step to carry out a number of calculations related to genomic relationship matrices and single step analyses.

Documentation on the various options available is given in a separate pdf file distributed with Example 20.

4.17.16 Penalized estimation

WOMBAT can carry out estimation imposing a penalty on the log likelihood function to improve the sampling properties of estimates. Three types of penalties are accommodated:

1.
A penalty regressing the so-called canonical eigenvalues towards their mean, as described by Meyer and Kirkpatrick [37], also referred to as ‘bending’. This can be done for the eigenvalues directly or after transformation to logarithmic scale. In addition, there is a choice of parameterisation.

As the canonical transformation involves only two matrices, this type of penalty is restricted to multivariate analyses fitting a simple animal model or random regression analyses fitting animals’ genetic and permanent environmental effects only.

2.
A penalty proportional to the matrix divergence between a covariance matrix to be estimated and a specified target matrix. This will shrink the estimate towards the target. Options are available to shrink a covariance matrix or a corresponding correlation matrix. A particular choice of target is the phenotypic covariance or correlation matrix estimated in a standard (unpenalized) analysis [see 38] – hence WOMBAT provides the facility to write out such target matrices. While this could, in principle, be applied to multiple matrices, the current implementation is restricted to a single matrix.
3.
A penalty on a correlation matrix,  shrinking the corresponding partial auto-correlations towards zero or their phenotypic counterparts. The penalty is derived assuming a (shifted) Beta distribution and its stringency is regulated through choice of the so-called prior effective sample size.

Penalized estimation is invoked by one or more lines in a SPECIAL block. It generates several additional output files, described in subsection 7.2.11.

4.17.16.1 Selecting a penalty

Penalized estimation is specified by a line with space separated entries:

(a)
The code PENALTY at the beginning of the line.
(b)
A code specifying what type of penalty is to be applied. Values recognised are:
CANEIG
selects a penalty on the canonical eigenvalues.
COVARM
specifies shrinkage of a covariance matrix.
CORREL
chooses shrinkage of a correlation matrix towards a given target correlation matrix.
KANEIG
selects a penalty on the canonical eigenvalues, obtained assuming a Beta distribution.
KORREL
selects a penalty on individual correlations, assumed to have a Beta distribution.
PACORR
chooses shrinkage of a the matrix of partial auto-correlations for a random effect towards the corresponding phenotypic or an identity matrix.

The following variables depend on the form of penalty chosen.

For CANEIG:

(c)
A character variable to specify the ‘scale’ to be used. Options available are:
ORG
selects a penalty on the canonical eigenvalues on the original scale.
LOG
specifies a penalty on the canonical eigenvalues after transformation to logarithmic scale.
LOGB
is similar to LOG but a penalty is applied to both log(λi) and log(1 λi), where λi denotes the ith canonical eigenvalue.
(d)
A character variable to specify the parametrisation of λi to be used. Options available are:
ORG
estimate λi directly.
LOG
estimate log(λi), i.e. λi transformed to logarithmic scale.
TIC
estimate log(λi(1 λi)), i.e. λi transformed to logistic scale.

This variable is optional; if omitted, it is set to the same value as that given for the scale of the penalty (LOG for LOGB).

(e)
A real or integer variable specifying either the tuning factor ψ to be used, or, if estimates for multiple tuning factors are to be obtained, the number of ψ’s to be read subsequently or the number of different ranges of ψ’s to be stepped through:

All values of ψ given should be in ascending order.

For COVARM and CORREL:

(c)
An optional character variable:
PHENV
This option specifies that the estimate of the phenotypic covariance or correlation matrix is to be written out to the file PenTargetMatrix.
It is best used in a preliminary step with a single ψ = 0 to generate the matrix to be used in subsequent penalized runs where we wish to shrink towards this matrix.
The default is for this file to exist and to be read.

EXAMPLE:

SPECIAL 
   PENALTY  COVARM PHENV  animal  0.0 
END
KLDIV
The default penalty on a matrix of size q × q is
pict

with T the shrinkage target and C = (ψ + q + 1)∕ψ. The latter value originates from the assumption of an Inverse Wishart distribution for Σ. This option sets C = 1, i.e. applies a penalty proportional to the Kullback-Leibler divergence between Σ and T.

EXAMPLE:

SPECIAL 
   PENALTY  COVARM  KLDIV  animal 7.50 
END
(d)
The name of the random effect covariance matrix to be penalized. This must match the name given in a VARIANCE statement (see section 4.11) earlier in the parameter file.

EXAMPLE:

SPECIAL 
   PENALTY  CORREL  animal -1 
   0.0 5.0 1.0 
END
(e)
Information on tuning parameter(s) to be used, as for CANEIG (see above).

For KANEIG: see example 19 For KORREL: see example 19 For PACORR:

(c)
An optional character variable:
PHENV
This specifies that the shrinkage target is the phenotypic matrix of partial auto-correlations
IDENT
This specifies that the shrinkage target is the identity matrix

If omitted, the default shrinkage target is the identity matrix.

(d)
The name of the random effect covariance matrix to be penalized. This must match the name given in a VARIANCE statement (see section 4.11) earlier in the parameter file.
(e)
A real value specifying the ‘effective sample size’ of the Beta prior (no multiple values allowed for this form of penalty).

EXAMPLE:

SPECIAL 
   PENALTY PACORR  animal  4.0 
END

4.17.16.2 Stopping after a given change in likelihood

If multiple tuning factors are given, it is sometimes desirable to stop once the deviation in the unpenalized log likelihood from the maximum (at ψ = 0) has exceeded a certain limit. This can be specified by an additional line with three space-separated entries:

(a)
The code PENALTY at the beginning of the line.
(b)
The code LIMIT.
(c)
A real variable with the value for the limit.

EXAMPLE:

SPECIAL 
   PENALTY  CORREL  KLDIV  animal -2 
   0.0  5.0   0.5 
   6.0  15.0  1.0 
   PENALTY  LIMIT  3.315 
END

For this option to work, the first tuning factor given has to be ψ = 0!

HINT: Use run time option --–valid (subsection 5.3.2) for validation steps when using cross-validation to estimate the value for ψ to be used.