4 The Parameter File

Chapter 4
The Parameter File

4.1 Overview

All information on the the model of analysis and the data and pedigree ﬁles is speciﬁed through the parameter ﬁle.

The name of the parameter ﬁle should have extension ’.par’. By default, WOMBAT expects to ﬁnd a ﬁle wombat.par in the current working directory.
Other ﬁlenames can be speciﬁed at run time as the last command line option – see chapter 5 for details.

Setting up the parameter ﬁle is straightforward, but care and attention to detail are required to get it ‘just right’. The worked examples give ‘templates’ for various types of analyses which are readily adaptable to other problems. Parsing of the lines in the parameter ﬁle is fairly elementary, hence it is important to adhere strictly to the format described in the following in painstaking detail. Note also that parsing relies on the parameter ﬁle being a ‘plain’ text ﬁle and not containing any ‘non-printable’ characters – e.g. do not use TAB characters instead of white spaces.

WOMBAT performs a limited number of consistency checks on the variables and models speciﬁed, but these are by no means exhaustive and should not be relied upon!

If WOMBAT does ﬁnd an obvious error, it will stop with a message like “exit WOMBAT” or, in verbose mode, “Programmed(error) stop for WOMBAT encountered”. Inconsistencies or errors not discovered are likely to wreak havoc in subsequent steps !

HINT: Use the -v option at run time. This will cause WOMBAT to echo each line in the parameter ﬁle as it is read (to the screen) – if there is a programmed error stop at this stage, it is easier to ﬁnd the mistake as you know which is the oﬀending line.

4.2 General rules

The parameter ﬁle is read line by line. Each line is assumed to be 90 characters long, i.e. anything in columns 91 onwards is ignored.

Any line with a # in column 1 is considered to be a comment line and is skipped. For some (non-numeric) input a continuation line to be read can be ﬂagged by placing a + as the last non-blank character on the line to be continued.

WOMBAT relies on speciﬁc codes at the beginning of each line (leading blank spaces are ignored) to distinguish between various types of information given. Valid codes are summarised in 4.1. Most codes can be abbreviated to 3 letters. The code and the information following it should be separated by space(s). Codes are not case sensitive, i.e. can be given in either upper or lower case letters.

Depending on the initial code, WOMBAT expects further information on the same or following lines. All variable and ﬁle names speciﬁed are treated as case sensitive. File names can be up to 30 characters long, and variable names can comprise up to 20 characters. All codes, names and variables must be separated by spaces.

The parameter ﬁle can have two diﬀerent types of entries :

1.

‘Short’ entries (e.g. ﬁle names, analysis type, comment) which consist of a single line.
Each of these lines must start with a code for the type of information given.

EXAMPLE:
PEDS pedigrees.dat
Here PEDS is the code for pedigree information, “pedigrees.dat” is the name of the ﬁle from which pedigree information is to be read.

2.

‘Long’ or block entries, spanning several lines (e.g. for the layout of the data ﬁle, or the model of analysis).
Each of these entries starts with a line which gives the code for the entry (see 4.1), and possibly some additional information. This is followed by lines with speciﬁc information, where of each of these lines again starts with a speciﬁc code. Except for blocks of starting values of covariance components, the block is terminated by a line beginning with END.

EXAMPLE:
MODEL 
  FIX  CGroup 
  RAN  Animal  NRM 
  TRA  Weight 
END MODEL
This shows a block entry for the model of analysis, where the model ﬁts CGroup as a crossclassiﬁed, ﬁxed eﬀect and Animal as random eﬀect with covariance matrix proportional to the numerator relationship matrix, and Weight is the trait to be analysed.

Diﬀerent entries should be speciﬁed in the order in which they are listed in the following sections.

Table 4.1: Valid codes for entries in the parameter ﬁle - Part I

Code	Within	Indicator for
RUNOP	–	Line with run options [4.3]
COMMENT	–	Comment line [4.4]
OVERRIDE	–	Change “hard-coded”program limit [4.5]
ANALYSIS	–	Code for analysis type [4.6]
PEDS	–	Name of pedigree ﬁle [4.7]
MRKFILE	–	Name of ﬁle with marker counts [4.8]

DATA	–	Name of data ﬁle [4.9]
TRNOS	DAT	Trait numbers (grouped input) [4.9.2]
NAMES	DAT	Multiple column names (grouped input) [4.9.2]

MODEL	–	Model of analysis [4.10]
FIX	MOD	Name of ﬁxed eﬀect [4.10.2]
COV	MOD	Name of ﬁxed covariable [4.10.3]
RAN	MOD	Name of random eﬀect [4.10.4]
RRC	MOD	Name of control variable [4.10.4.1]
SUBJ	MOD	Name of variable identifying “subject” [4.10.5]
EXT	MOD	Name extra variable [4.10.6]
TRAIT	MOD	Trait name [4.10.7]

RESIDUAL	–	Residual covariance matrix [4.11.1]
ERROR	–	Residual covariance matrix [4.11.1]
VARIANCE	–	Covariance matrix: random eﬀects [4.11.2]

SE+USR		Additional, user deﬁned functions of covariances [4.12]
SUM	SE+	Weighted sum
VRA	SE+	Variance ratio
COR	SE+	Correlation

POOL	–	Options for pooling of covariance components [4.13]
PSEUPED	POOL	Pseudo pedigree structure
SMALL	POOL	Minimum eigenvalue
ZEROUT		Fixed eﬀects levels to be set to zero [4.14]
PSEUDOFX		Random eﬀects levels to be treated as random [4.15]
FIXVAR	–	Fix covariance components [4.16]
END

Table 4.2: Valid codes for entries in the parameter ﬁle - Part II

Code	Within	Indicator for
SPECIAL	–	Miscellaneous options [4.17]
WEIGHT	SPECIAL	Weighted analysis [4.17.1]
COVZER	SPECIAL	How to deal with ‘zero’ covariables [4.17.2]
CLONES	SPECIAL	Ignore sorting of data ﬁle [4.17.3]
FORCE-SE	SPECIAL	Force calculation of standard errors [4.17.4]
BSOLVE-SNP	SPECIAL	Invoke ‘back-solving’ for SNP eﬀects [4.17.5]
REPEAT	SPECIAL	Run with low proportion of repeated records [4.17.6]
RPTCOV	SPECIAL	Specify residual covariance structure [4.17.6]
QTLEFF	SPECIAL	Specify which covariable is a QTL/SNP eﬀect [4.17.7]
SAMPLEAI	SPECIAL	Sampling to approximate standard errors [4.17.8]
SOCIAL	SPECIAL	Analysis with ’social’ genetic eﬀect [4.17.9]
GENGROUPS	SPECIAL	Fit ‘explicit’ genetic groups [4.17.10]
INCORE	SPECIAL	Analysis storing info in core [4.17.11]
AOM-RES	SPECIAL	Calculate outlier statistics for residuals [4.17.12]
KERNEL	SPECIAL	Gaussian kernel [4.17.13]
RRCORR-ALL	SPECIAL	Alll correlations for RR analyses [4.17.14]
HINVERSE	SPECIAL	Genomic and joint relationship matrices [4.17.15]
PENALTY	SPECIAL, POOL	Invoke penalized estimation [4.17.16]

4.3 Run options

While run time options (see chapter 5) are primarily intended to be read from the command line, WOMBAT also provides the facility to specify such options on the ﬁrst line of the parameter ﬁle. To be recognised, this line must start with the code RUNOP and, after space(s), all options are expected to be given on the same line (space separated), in the same form as on the command line.

4.4 Comment line

WOMBAT allows for a single, optional comment line (up to 74 characters) to be speciﬁed. This is speciﬁed by the code COMMENT (can be abbreviated to COM) at the beginning of the line. Anything following the spaces after COM(MENT) is treated as comment on the analysis. It is printed in some of the output ﬁles generated by WOMBAT to assist the user, and has no other use.

4.5 Overriding default program limits

WOMBAT assigns most workspaces used dynamically. However, there are a number of dimensions which require an initial value and these have been assigned default values. Their settings can be inspected by running WOMBAT with run option --limits (see subsection 5.2.15). Occasionally, these defaults are too small or too large for the intended analysis or the hardware available. Hence an override mechanism is provided through a line in the parameter ﬁle consisting of the code OVERRIDE (can be abbreviated to OVER) at the beginning of the line followed (space separated) by the name of the variable to be changed and its new value. There should be one such line for each variable to be changed. Variable names recognised are:

MAXCOLS: for the maximum number of columns in the data ﬁle,
MAXCOV: for the maximum number of covariables to be ﬁtted,
MAXFIX: for the maximum number ﬁxed eﬀects to be ﬁtted,
MAXRAND: for the maximum number of random eﬀects to be ﬁtted,
MAXEXTRA: for the maximum number of “extra” eﬀects to be ﬁtted,
MAXMETA: for the maximum number of control variables in random regression analyses,
MAXTRAIT: for the maximum number of traits,
MAXCOMBI: for the maximum number of combinations of traits and records per individual,
MAXANIS: for the maximum number levels of the genetic eﬀect (“animals”) corresponding to the pedigree or *.gin ﬁle supplied, and
MAXEQNS: for the maximum number of equations in the model of analysis

These names can be abbreviated to the ﬁrst six letters. Values for MAXANIS and MAXEQNS apply to REML analyses or BLUP runs with direct solution for the mixed model equations – the limits for BLUP analyses via iterative solutions are obtained by multiplying the respective values with a factor of 20.

N.B.: Remember to put any such lines close to the top of the parameter ﬁle – so that they are read prior to parts specifying the data layout or model of analysis.

4.6 Analysis type

This is a single line entry beginning with code ANALYSIS (can be abbreviated to ANA), followed by a two- or three-letter code describing the type of analysis. The following codes are recognised :

UNI: for a univariate analysis,
MUV: for a ‘standard’ multivariate analysis,
RR: for a single-trait random regression analysis, and
MRR: for a multi-trait random regression analysis.

Except for UNI, this can be followed (after space(s)) by the code PC to select an analysis which ﬁts the leading principal components only for some of the random eﬀects ﬁtted and yields reduced rank estimates of the corresponding covariance matrices. For MUV and MRR the number of traits in the analysis must be given as well – this should be the last entry of the line.

EXAMPLE:
ANALYSIS MUV PC 8
speciﬁes a reduced rank, multivariate analysis for 8 traits

4.7 Pedigree ﬁle

This is a single line entry beginning with code PEDS (can be abbreviated to PED), followed by the name of the pedigree ﬁle. There is no default name. This entry is ‘optional’, in such that it is only required if a code of NRM if speciﬁed for the covariance structure of a random eﬀect (see subsection 4.10.4). Only one pedigree ﬁle can be given. The format of the pedigree ﬁle required by WOMBAT is described in detail in section 6.3.

By default, WOMBAT ﬁts an animal model. If a sire model is to be ﬁtted, the code SIREMODEL (can be abbreviated to SIR) should be given after the ﬁlename.

Additional options, given after the ﬁlename and, if applicable the sire model option, that are recognised are +INBR or +SEX. These instruct WOMBAT to read animals’ inbreeding coeﬃcients or sex codes (1=X/2=XX) from the fourth of ﬁfth column in the pedigree ﬁle. If given, +INBR causes WOMBAT to skip calculation of inbreeding coeﬃcients and use the values supplied instead. Option +SEX is required if an NRM for X-linked genetic eﬀects is to be set up.

4.8 Marker counts ﬁle

This is a single line entry beginning with code MRKFILE (can be abbreviated to MRK), followed by the name of the ﬁle. There is no default name. This entry is ‘optional’, i.e. it is only required for selected analyses where the default name MarkerCounts.dat is to be replaced. The layout of the marker ﬁle required by WOMBAT is described in detail in section 6.4.

4.9 Data ﬁle

This is a block entry. The block begins with a line containing the code DATA (can be abbreviated to DAT) followed by the name of the data ﬁle. There is no default name. The general form of the data ﬁle required in described in section 6.2.

The following lines specify the record layout of the data ﬁle for all traits. There are two alternative ways of speciﬁcation.

4.9.1 Simple

For each trait in turn, there should be one parameter ﬁle line for each column, up to the last column used in the analysis.
The lines can have up to 3 elements :

(a): The code TRn where n is a one- or two-digit trait number. This can be omitted for univariate analyses.
(b): The name of the variable in this column.
(c): The maximum number of levels. This is required if the column represents a ﬁxed or random eﬀect in the model of analysis or a control variable in a random regression analysis.
Exception: For random eﬀects with covariance option NRM (i.e. additive genetic eﬀects) the number of levels can be given as 0 as WOMBAT counts the number of animals in the pedigree and replaces this number.

The block is terminated by a line with the code END.

EXAMPLE:

DATA  mydata.dat 
  TR1  traitno 2 
  TR1  animal  1000 
  TR1  fixeffect 50 
  TR1  weight 
  TR2  traitno 2 
  TR2  animal 500 
  TR2  fixeffect 30 
  TR2  feedintake 
END DATA

This shows the block for an analysis reading records for 2 traits from the ﬁle mydata.dat.

4.9.2 Compact

If there are several traits for which the record layout is the same, the respective record layout can be given for the whole group of traits. This avoids tedious duplication of lines.

This alternative is selected by placing the code GRP after the name of the data ﬁle (same line, separated by a space).
For each group of traits, the following lines need to be given :

1.

A ‘header’ line beginning with the code TRNOS (can be abbreviate to TRN), followed by the running numbers of the traits in the group on the same line.

2.

One line for each column in the data ﬁle (up to the last column used) which is the same for all traits, containing

(a): the variable name
(b): the maximum number of levels, if the column represents a ﬁxed or random eﬀect in the model of analysis (again, this number can be zero for additive genetic eﬀects; see [4.9.1]).

3.

One line for each column which has a diﬀerent name for diﬀerent traits (e.g. representing the traits to be analysed), containing

(a): the code NAMES (can be abbreviated to NAM)
(b): the variable names (on the same line, space separated; the same number of variables as traits in the group must be given)

HINT: Use a continuation line if the variable names don’t all ﬁt on a single line.

Again, the block is terminated by a line with the code END.

EXAMPLE:
DATA  mydata.dat  GRP 
   TRNOS  1 2 
   traitno     2 
   animal   1000 
   fixeffect  50 
   NAMES   weight  feedintake 
END DATA
This shows the ‘grouped’ alternative for specifying the data ﬁle layout, for two traits with the same column structure in the example above.

4.10 Model of analysis

This is another block entry. The block begins with a line containing the code MODEL (can be abbreviated to MOD), and ﬁnishes with a line beginning with END. The block then should contain one line for each eﬀect to be ﬁtted and one line for each trait in the analysis.

4.10.1 Eﬀects ﬁtted

Each of the ‘eﬀect’ lines comprises the following

(a): a three-letter code for the type of eﬀect,
(b): the eﬀect name, where the eﬀect name is a combination of the variable name for a column in the data ﬁle and, if appropriate, some additional information.
No abbreviations for variable names are permitted, i.e. there must be an exact match with the names speciﬁed in the DATA block.
(c): If the eﬀect is to be ﬁtted for a subset of traits only, the running numbers of these traits must be given (space separated).

4.10.2 Fixed eﬀects

Fixed eﬀects can be cross-classiﬁed or nested ﬁxed eﬀects or covariables. The following codes are recognised :

FIX

This speciﬁes a ﬁxed eﬀect in the model of analysis.

NB The name for a ﬁxed eﬀect should not contain a “(”, otherwise it is assumed that this eﬀect is a covariable.

A simple, one-way interaction of two variables can be speciﬁed as vn1*vn2, with vn1 and vn2 valid variables names. [Not yet implemented !]

HINT: ‘Not implemented’ here means merely that WOMBAT will not code the interaction for you – you can, of course, still ﬁt a model with an interaction eﬀect, but you a) have to insert an additional column in the data ﬁle with the code for the appropriate subclass, b) ﬁt this as if it were a crossclassiﬁed ﬁxed eﬀect, and c) specify any additional dependencies arising explicitly (using ZEROUT, see below).

4.10.3 Fixed covariables

COV

This speciﬁes a ﬁxed covariable. The eﬀect name should have the form “vn(n,BAF)”, where vn is a variable name, the integer variable n gives the number of regression coeﬃcients ﬁtted and BAF stands for a three-letter code describing the basis functions to be used in the regression. NB: By default, intercepts for ﬁxed covariables are not ﬁtted.

Valid codes for basis functions are

POL: for ordinary polynomials.
This is the default and can be omitted, i.e “vn(n)” is equivalent to “vn(n,POL)”. For instance, n = 2 denotes a quadratic regression. Note that WOMBAT deviates both records and covariables from their respective means prior to analysis.

N.B.: This yields a regression equation of the form
$y − ¯y = b1(x − ¯x)+b2(x− ¯x)2+ ⋅⋅⋅$
rather than an equation of form
$2 y = β0+ β1x+ β2x + ⋅⋅⋅$
This should be born in mind when interpreting any solutions for regression coeﬃcients for POL covariables from WOMBAT - while there is a straightforward relationship between coeﬃcients β_i and b_i, they are not interchangeable.
LEG: for Legendre polynomials.
For example, n = 3 denotes a cubic polynomial, i.e. comprises a linear, quadratic and cubic regression coeﬃcient, but no intercept (This diﬀers from the implementation for random regressions where n = 4 denotes a cubic polynomial).
BSP: for B-spline functions
For analyses ﬁtting spline functions, the degree of the spline is selected by specifying “L”, “Q” or “C” for linear, quadratic and cubic, respectively, immediately (no space) after the code BSP. Note that the default spline function is an equi-distant B-spline (i.e. the range of values of the covariable is divided into intervals of equal size), with the number of knots and intervals determined from the number of regression coeﬃcients and the degree of the spline (k = n − d + 1 where k is the number of knots and d is the degree, d = 1 for “L”, d = 2 for “Q” and d = 3 for “C”) [31, section 2.2]. Other spline functions are readily ﬁtted as user-deﬁned basis functions.
USR: for user deﬁned functions

Fitting of an intercept (in addition to deviation from means) can be enforced by preceding n with a minus sign – this is not recommended unless there are no other ﬁxed eﬀects in the model.

A covariable to be ﬁtted as nested within a ﬁxed eﬀect is speciﬁed as “vn1*vn2(n,BAF)”, with vn1 the name of the ﬁxed eﬀect. If vn1 is not ﬁtted as a ﬁxed eﬀect, it must be speciﬁed an an ‘extra’ eﬀect (see below).

WOMBAT is fussy when it encounters a covariable which has a value of zero: Covariables which can take such value are only permitted if a SPECIAL option is given which conﬁrms that these are valid codes and not ‘missing’ values; see subsection 4.17.2.

4.10.4 Random eﬀects

Random eﬀects include the ‘control variables’, i.e. random covariables for random regression analyses. The following codes are recognised:

RAN

This code speciﬁes a random eﬀect. It should be followed (space separated) by the name of the random eﬀect. After the name, a three-letter code describing the covariance structure of the eﬀect can be given.
Valid codes for covariance structures are :

NRM: which denotes that the random eﬀect is distributed proportionally to the numerator relationship matrix.
If this code is given, a pedigree ﬁle must be supplied.
SEX: which denotes that the random eﬀect is distributed proportionally to the numerator relationship matrix for X-linked genetic eﬀects. This inverse of this matrix is set up directly from the pedigree information, as described by Fernando and Grossman [12].
If this code is given, it is assumed that an autosomal genetic eﬀects (option NRM with corresponding pedigree ﬁle) is also ﬁtted and has already been speciﬁed. In addition, the pedigree ﬁle is expected to have an additional column specifying the number of X-chromosomes for each individual.
IDE: which denotes that diﬀerent levels of the random eﬀect are uncorrelated. This is the default and can be omitted.
GIN: which denotes that the random eﬀect is distributed proportionally to an ‘arbitrary’ (relationship or correlation) matrix.
The user must supply the inverse of this matrix in the form outlined in chapter 6.
PEQ: which denotes a permanent environmental eﬀect of the animal for data involving ‘repeated’ records, which is not to be ﬁtted explicitly. Instead, an equivalent model is used, which accounts for the respective covariances as part of the residual covariance matrix. This is useful for the joint analysis of traits with single and repeated records.

N.B.: Do not use this option for other eﬀects - WOMBAT has no mechanism for checking that this option is appropriate.

For ’standard’ uni- and multivariate analyses, the random eﬀect name is simply the variable name as given for data ﬁle.

4.10.4.1 Random regressions

For random regression analyses, the variable name is augmented by information about the number of random regression coeﬃcients for this eﬀect and the basis functions used. It becomes “vn(n,BAF)”, analogous to the speciﬁcation for covariables above. As above, n speciﬁes the number of regression coeﬃcients to be ﬁtted. In contrast to ﬁxed covariables, however, an intercept is always ﬁtted. This implies that n gives the order, not the degree of ﬁt. For instance, n = 3 in conjunction with a polynomial basis function speciﬁes a quadratic polynomial with the 3 coeﬃcients corresponding to the intercept, a linear and a quadratic term. WOMBAT allows for diﬀerent control variables to be used to ﬁt random regressions for diﬀerent eﬀect. If the model has more than one RRC statement (see below), the speciﬁcation of random eﬀects needs to be extended to tell WOMBAT which control variable to use for which eﬀect, i.e. “vn(n,BAF,rrcn)” with rrcn the name of the control variable.

Valid codes for BAF are:

POL: for ordinary polynomials.
This is the default and can be omitted, i.e “vn(n)” is equivalent to “vn(n,POL)”.
LEG: for Legendre polynomials.
BSP: for B-spline functions
USR: for user deﬁned functions
IDE: for an identity matrix, i.e. the i−th basis function has a single coeﬃcient of “1” for the i−th coeﬃcient with all other elements zero. This requires the number of RR coeﬃcients to be equal to the number of levels of the control variable.
It is useful when ﬁtting a multi-trait model as a RR model, e.g. to allow for heterogeneous residual variances.
ONE: to assign a value of unity (“1”) to all coeﬃcients.
This option has been introduced to facilitate a standard multivariate analysis with repeated records by ﬁtting a random regression model to model an arbitrary pattern of temporary environmental covariances correctly.

RRC

This code speciﬁes a ‘control variable’ in a random regression analysis. It should be followed (space separated) by the name of the variable, as given in the DATA statement.
Optionally, immediately after the name (no spaces), the range of values of the variable to be considered can be speciﬁed as (m − n), with m the lower and n the upper limit.

N.B.: WOMBAT expects the value of the control variable to be non-negative (i.e. 0 or greater) and not to be a fraction (the control variable is read as a real variable but converted to the nearest integer, e.g. values of 3,.0, 3.1 and 3.4 would be treated as 3) – scale your values appropriately if necessary!

N.B.: For multivariate analyses, WOMBAT collects the range of values for the control variable (i.e. minimum and maximum) across all traits. This may be undesirable if diﬀerent traits have distinct ranges and Legendre polynomials are used as basis function - if so, use the USR option and set up your own ﬁle which maps the range for each trait exactly as you want it!

4.10.5 ‘Subject’ identiﬁer

For animal model analyses, WOMBAT assumes that the code for the ﬁrst genetic eﬀect ﬁtted also identiﬁes the subject on which measurements are taken. For some analyses (in particular those not ﬁtting any additive genetic eﬀects!) and sire models this is not appropriate, and such identiﬁer needs to be supplied as an extra column in the data ﬁle.

SUBJ: This code needs to be followed by the name of the variable which identiﬁes the individual.

4.10.6 ‘Extra’ eﬀects

For some models, coding is required for eﬀects which are not explicitly ﬁtted in the model of analysis, for instance, when ﬁtting nested eﬀects. These need to be speciﬁed as ‘extra’ eﬀects.

EXT: This code, followed by the respective variable name, denotes an eﬀect which is not ﬁtted in the model but which is required to deﬁne other eﬀects.

4.10.7 Traits analysed

One line should be given for each trait. It should contain the following information :

(a)

The code TRAIT (can be abbreviated to TR).

(b)

The name of the trait, as speciﬁed in the DATA block.

(c)

The running number of the trait.

In most cases, this is simply the number from 1 to q, where q is the total number of traits in a multivariate analysis.
In addition, WOMBAT provides the opportunity to replace the trait number in the data ﬁle with a diﬀerent number. This is useful, for instance, to carry out an analysis involving a subset of traits without the need to edit the data ﬁle. The syntax for this is : “k”->m. This speciﬁes that value k in the data ﬁle should be replaced with value m for the analysis. If this is encountered, any records with trait numbers not selected in this fashion are ignored, i.e. this provides a mechanism for automatic subset selection.

HINT: All q traits in the analysis should be speciﬁed in this manner, even if k = m for some trait(s).

(d)

Optional : A numeric value (integer) representing a ‘missing’ value - any records with the trait equal to this value are ignored in the analysis (default is −123456789).¹

4.11 Covariance components

Next, the parameter ﬁle needs to specify values for all (co)variance components in the model of analysis. For variance component estimation, these are the starting values used. For simple BLUP analyses and simulation runs, these are the values assumed to be the population values.

The input matrices for full rank analyses must be positive deﬁnite, i.e. cannot have eigenvalues less than the operational zero. For reduced rank (PC) analyses, some ‘zero’ but no negative eigenvalues are acceptable, provided the rank (i.e. the number of non-zero eigenvalues) is equal to (or greater than) the number of principal components to be ﬁtted.

4.11.1 Residual covariances

4.11.1.1 ‘Standard’ multivariate analyses

The residual covariance matrix is speciﬁed by

1.

A line beginning with the code RESIDUAL (can be abbreviated to RES) or ERROR (can be abbreviated to ERR). This is followed by the dimension of the covariance matrix, q (integer number, space separated). If an analysis type PC has been speciﬁed, a second number specifying the rank of the estimated covariance matrix, needs to be given (even if this is equal to q).

Optionally, this can be followed (space separated) by a qualiﬁer:

DIAG: speciﬁes that the residual covariance matrix is diagonal
NOSTART: (can be abbreviated to NOS) speciﬁes that no starting values are given; this is only valid in conjunction with the run option --pool !

2.

Without qualiﬁer: The q(q + 1)∕2 elements of the upper triangle of the residual covariance matrix, given row-wise (i.e. σ₁², σ₁₂, …, σ_1q, σ₂², σ₂₃, …, σ_q²).
These can be given several elements per line (space separated, taking care not to exceed the total line length of 78 characters), or one per line, or a mixture – WOMBAT will attempt to read until it has acquired all q(q + 1)∕2 elements required.

With qualiﬁer: If DIAG is speciﬁed only the q diagonal elements (σ₁², …, σ_q²) are read, and if NOSTART is given, no values are read.

4.11.1.2 Random regression analyses

Again, the residual covariance matrix is speciﬁed by

1.

A line beginning with the code RESIDUAL (can be abbreviated to RES) or ERROR (can be abbreviated to ERR). This should be followed by the dimension (q) of each residual covariance matrix (integer number), usually equal to the number of traits, and a code indicating what kind of error covariance function is to be ﬁtted. The following codes are recognised :

HOM: This code speciﬁes homogeneous error covariances for all values of the control variable.
HET: This code speciﬁes heterogeneous error covariances, with the covariance function a step function of the control variable. It should be followed (space separated) by an integer number giving the number of steps.

2.

If the model involves multiple control variables, the name of the variable to be used to determine the temporary environmental covariance structure needs to be given at the end of the line.

3.

One or more lines with the residual covariance matrices, consisting of the q(q + 1)∕2 elements of the upper triangle given row-wise, and, if applicable, additional information.

For HOM only a single covariance matrix needs to be given.
For HET the number of covariance matrices speciﬁed must be equal to the number of steps (in the step function, i.e. intervals). Each should begin on a new line and be preceded by two integer values (space separated) which give the upper and lower limits (inclusive) of the interval of the control variable, for which this matrix is applicable.

N.B.: ‘Step intervals’ must cover the complete range of values for the control variable encountered in the data.
EXAMPLE: For a univariate RR analysis with values of the control variable ranging from 1 to 5, say we want to ﬁt separate residual variances for 1 & 2, 3 & 4 and 5:
```
VAR  residual  1  HET  3 
    1  2   1.234 
    3  4   3.456 
    5  5   5.678
```

4.11.2 Covariances for random eﬀects

Similarly, for each covariance matrix due to random eﬀects to be estimated, a ‘header’ line and the elements of the covariance matrix need to be speciﬁed.

1.: A line beginning with the code VARIANCE (can be abbreviated to VAR), followed by the name of the random eﬀect and the dimension of the covariance matrix, q (integer number, space separated). If an analysis type PC has been speciﬁed, a second number specifying the rank of the estimated covariance matrix, needs to be given (even if this is equal to q).
The name can simply the random eﬀects name as speciﬁed for the model of analysis. Alternatively, it can be of the form “vn1+vn2” where vn1 and vn2 are names of random eﬀects speciﬁed in the MODEL block. This denotes that the two random eﬀects are assumed to be correlated and that their joint covariance matrix is to be estimated¹ ¹Currently implemented for full-rank, ‘standard’ analyses only ! .
Again, a qualiﬁer DIAG or NOSTART, as described above for the residual covariance matrix (see subsubsection 4.11.1.1) is recognized.
2.: The q(q + 1)∕2 elements of the upper triangle of the covariance matrix, given row-wise (i.e. σ₁², σ₁₂, …, σ_1q, σ₂², σ₂₃, …, σ_q²).
Again, these can be given several elements per line (space separated, taking care not to exceed the total line length of 78 characters), or one per line, or a mixture – WOMBAT will attempt to read until it has acquired all elements required.
As above, if DIAG is speciﬁed, only the q variances are read and with NOSTART no values are acquired.

4.12 Additional functions of covariance components

In addition, users can deﬁne other functions of covariance components which should be calculated and for which sampling errors should be approximated. This is done in a block entry, beginning with a line containing the code SE+USR (can be abbreviated to SE+), and ending with a line beginning with END. The block should then contain one line for each function to be calculated. The content of the line depends on the type of function. Three types are recognised.

1.

Linear combinations (weighted sums) of the covariance components in the model of analysis. For these, space separated entries o should be

(a): The code SUM at the beginning of the line.
(b): A name to identify the sum to be calculated.
(c): An entry of the form n(w) for each component of the weighted sum, with n the running number of the covariance component and w the weight it is to be multiplied with. If w is unity, it can be omitted, i.e. the entry n is interpreted as n(1).

2.

Ratios of two covariance components. For these, the line should have three entries

(a): The code VRA at the beginning of the line.
(b): A name for the variance ratio to be calculated.
(c): The running number of the covariance component in the numerator.
(d): The running number of the covariance component in the denominator.

3.

Correlations, i.e. the ratio of a covariance component and the square root of the product of two other covariances. Line entries for these are

(a): The code COR at the beginning of the line.
(b): A name to identify the correlation to be calculated.
(c): The running number of the covariance component in the numerator.
(d): The running number of the ﬁrst covariance component in the denominator.
(e): The running number of the second covariance component in the denominator.

EXAMPLE:
SE+USR 
  SUM  siga+pe  1  2 
  VRA  repeat   5  4 
END
Consider a univariate analysis with repeated records per individual. Fitting additive genetic and permanent environmental eﬀects of the animal, variance components in the model are σ_A², σ_PE² and σ_E² (residual), with running numbers 1, 2 and 3, respectively. WOMBAT automatically calculates the phenotypic variance σ_P²= σ_A² + σ_PE² + σ_E² and gives it running number 4. To calculate the repeatability and approximate its sampling error, we ﬁrst need to deﬁne the sum of σ_A² and σ_PE² as a new covariance (which receives running number 5), and then deﬁne the repeatability as the ratio of this component and σ_P².

HINT: Run WOMBAT with the --setup option to begin with. Inspect the ﬁle ListOfCovs generated in this step – this gives a list of running numbers for individual covariance components.

4.13 Pooling estimates of covariance components

Information related to pooling of covariance components using a (penalized) maximum likelihood approach is speciﬁed in a block entry, beginning with a line containing the code POOL and ending with a line beginning with END. Within the block, the following directives are recognized:

PSEUPED

followed (space separated) by a three letter code specifying the assumed pedigree structure and values on sizes or numbers of families. The following pseudo-pedigree codes are available:

PHS

denotes a simple balanced paternal half-sib design. Optionally, two integer numbers giving the numbers of sires and the number of progeny per sire, respectively, can be given (space separated, on the same line) after the code. If not given, default values of 10 and 4 are used.

This option is suitable for a simple animal model only. WOMBAT will check that a) there is only one random eﬀect ﬁtted, and b) that this has covariance option NRM. If the MINPAR option (see below) is used, WOMBAT cannot perform these checks; hence the DIRADD code together with the name of the random eﬀect, as described below, need to be given.

HFS

implies a balanced hierarchical full-sib design comprised of s sires, d dams per sire and n progeny per dam. Assumed values for s, d and n can be given (space-separated) on the same line. If omitted, default values of s = 10, d = 5 and n = 4 are used. This option is suitable for a simple animal model or a model ﬁtting maternal permanent environmental eﬀects in addition (i.e. accommodates a maximum of 2 random eﬀects). Again, if the MINPAR option is used, codes DIRADD and MATPE need to be given in addition.

BON

selects a design comprising 8 individual per family in two generations, due to Bondari et al. [5]. Optionally, this code can be followed (space separated) by the number of such families (integer); if not given, a default value of 2 is used. For this design, expectations of covariances between relatives due to direct and maternal eﬀects (genetic and permanent environmental) are available (i.e. a maximum of three eﬀects). In order for WOMBAT to ‘know’ which random eﬀect has which structure, additional information is required. This should comprise one additional line per random eﬀect ﬁtted, with each line consisting of a keyword specifying the type of random eﬀect followed (space separated) by the name of the eﬀect as speciﬁed in the model of analysis. Keywords recognized are DIRADD for direct additive genetic, MATADD for maternal genetic and MATPE for maternal permanent environmental eﬀects.

EXAMPLE:

MODEL 
   RAN     animal  NRM 
   RAN     gmdam   NRM 
   RAN     pedam   IDE 
   trait   bwgt    1 
   trait   wwgt    2 
   trait   ywgt    3 
END MOD 
VAR animal   3  NOS 
VAR gmdam    3  NOS 
VAR pedam    3  NOS 
VAR residual 3  NOS 
POOL 
   PSEUPED    BON  10 
      DIRADD  animal 
      MATADD  gmdam 
      MATPE   pedam 
   SMALL      0.001 
   DELTAL     0.005 
END

USR

Other designs and models are readily accommodated by specifying the expectations of covariances between family members for each random eﬀect ﬁtted (excluding residuals - this is set automatically to be an identity matrix). This is selected by the USR option which must be followed (space separated) by an integer specifying the family size. Optionally, a second number giving the number of families is recognized (a default value of 2 is used if not given). Then the upper triangle of the matrix of coeﬃcients in the expectation of covariances has to be given for each random eﬀect ﬁtted. For multiple random eﬀects, these have to be given in the same order in which they have been speciﬁed in the VAR statements in parameter ﬁle, and the matrix for each eﬀect has to begin on a new line.

EXAMPLE:

MODEL 
  RAN  animal  NRM 
    ... 
END MOD 
VAR   animal   4  NOSTART 
VAR   residual 4  NOSTART 
POOL 
   PSEUPED  USR  5 
   1.0  0.50  0.50  0.50  0.50 
   1.0  0.25  0.25  0.25 
   1.0  0.25  0.25 
   1.0  0.25 
   1.0 
END

This shows the coeﬃcients for direct additive genetic eﬀects for a family comprising a sire (individual 1) with four progeny from unrelated dams.

SMALL

followed (space separated) by a real number which gives the lower limit (≤ 1) for smallest eigenvalue allowed in the combined matrices. If this is not speciﬁed, a default value of 0.0001 is used.

DELTAL

followed (space separated) by a real number specifying the convergence criterion: if the increase in log likelihood between iterates falls below this value, convergence is assumed to have been achieved. If not speciﬁed a stringent default of 0.00005 is used.

MINPAR

The default assumption for the parameter ﬁle used is that it is a ‘full’ parameter ﬁle as for a corresponding, multivariate analysis. However, the information on pedigree and data ﬁle and their layout are not actually required. Hence the option MINPAR (on a line by itself) is provided which allows use of a ‘minimum’ parameter ﬁle, switching oﬀ some of the consistency checks otherwise carried out. If used, MINPAR must be the ﬁrst entry within the POOL block.

The minimum information to be given in the parameter ﬁle must comprise:

1.: The ANALysis statement
2.: A VAR line for each covariance matrix, together with the NOSTART option telling WOMBAT not to expect a matrix of starting values.
3.: The POOL block, including statements showing which random eﬀect represents which type of genetic or non-genetic eﬀect.

EXAMPLE:

ANAL MUV 14 
VAR  animal    14  NOSTART 
VAR  residual  14  NOSTART 
POOL 
   MINPAR 
   SMALL  0.001d0 
   PSEUPED  hfs  100  10  4 
     DIRADD  animal 
END

SINGLE

The default form of input for results from part analysis is to read estimates from separate ﬁles, in the form of output generated by WOMBAT when carrying out multiple analyses of subsets of traits. Alternatively, all information can be given in a single ﬁle. This is selected by the option SINGLE, followed (space separated) by the name of the input ﬁle (same line). The layout required for this ﬁle is described in subsubsection 6.6.4.2.

PENALTY

followed (space separated) by a code word(s) deﬁning the type and strength of penalty to be applied. Codes for the penalty type recognised are:

CANEIG: selects a penalty on the canonical eigenvalues. This has to be followed (space separated) by either ORG or LOG specifying a penalty on eigenvalues on the original scale or transformed to logarithmic scale (no default).
COVARM: speciﬁes shrinkage of a covariance matrix towards a given target.
CORREL: chooses shrinkage of a correlation matrix towards a given target correlation matrix.

Either COVARM or CORREL can be followed (space separated) by the keyword MAKETAR. If this is given, WOMBAT determines the shrinkage target as the phenotypic covariance (or correlation) matrix obtained by summing estimates of covariances for all sources of variation from the preceding, unpenalized analysis. If this is not given, the upper triangle of the target matrix is expected to be read from a ﬁle with the standard name PenTargetMatrix; see subsubsection 6.6.7.1.

The last entry on the line relates to the tuning factor(s) to be used: If a single penalized analysis is to be carried out, the corresponding tuning factor should be given (real value). To specify multiple penalized analyses, specify the number of separate tuning factors multiplied by −1 as an integer value (e.g. -3 means 3 analyses), and list the corresponding tuning factors space separated on the next line.

EXAMPLE:
1.
Shrink all correlation matrices towards the phenotypic correlation matrix using a single tuning factor of 0.1, and calculate the shrinkage target from unpenalized results.
PENALTY CORREL  MAKETAR  0.1
2.
Shrink canonical eigenvalues on the logarithm scale towards their mean, using 5 diﬀerent tuning factors
PENALTY CANEIG LOG  -5 
0.01  0.1  0.5  1.0  2.0

4.14 Dependencies among ﬁxed eﬀects

WOMBAT requires a coeﬃcient matrix in the mixed model equations which is of full rank. Hence, constraints must be placed on the ﬁxed eﬀects part of the model if more than one ﬁxed eﬀect is ﬁtted. By default, the ﬁrst level of each cross-classiﬁed ﬁxed eﬀect other than the ﬁrst is ‘zeroed out’ for each trait to account for dependencies. If there are additional dependencies, these should be identiﬁed and speciﬁed explicitly prior to each analysis.

WOMBAT performs a simple least-squares analysis on the ﬁxed eﬀects part of the model, attempting to ﬁnd such additional dependencies. However, this procedure should not be relied upon, in particular for large data sets where numerical errors tend to accumulate suﬃciently to obscure identiﬁcation. Dependencies not properly taken into account can lead to problems during estimation !

Additional eﬀects to be zeroed out are speciﬁed in a block entry. The block begins with a line containing the code ZEROUT (can be abbreviated to ZER), and ﬁnishes with a line beginning with END. The block then should contain one line for each additional dependency. Each line should contain three entries :

(a): The name of the ﬁxed eﬀect, as speciﬁed in the MODEL block.
(b): The ‘original’ code for the level to be zeroed out, as encountered in the data ﬁle.
(c): The trait number; this can be omitted for univariate analyses.

4.15 Random eﬀects to be treated as ﬁxed

In some instances, it is desirable to treat selected levels of a random, genetic eﬀect as if it were ‘ﬁxed’. A typical example is the analysis of dairy data under a sire model. If the data includes highly selected proven bulls which only have records from their second crop of daughters, we might want to treat these bulls as ‘ﬁxed’ as we would expect estimates of the genetic variance to be biased downwards otherwise.

Treating selected random genetic eﬀects levels as ﬁxed is achieved by replacing the diagonal in the numerator relationship matrix by a value of zero when setting up the inverse of the numerator relationship matrix. If there is any pedigree information for the eﬀect, this is ignored.

N.B.: Treating levels of random eﬀect(s) as ﬁxed may lead to additional dependencies in the ﬁxed eﬀects part of the model, especially for multivariate analyses. Great care must be taken to identify these and specify them explicitly, as outlined above (section 4.14). WOMBAT has no provision to account for this possibility !

Random genetic eﬀects levels to treated as ﬁxed are speciﬁed in a block entry. The block begins with a line containing the code PSEUDOFX (can be abbreviated to PSE). This can be followed (space separated) by a real number. If given, this value (which should be a small positive number, e.g. 0.0001 or 0.001) is used instead of the value of 0 when specifying the contributions to the inverse of the numerator relationship matrix. Eﬀectively, this treats the respective eﬀect level as “just a little bit random”. This option is provided as a mean to counteract additional dependencies in the model (see warning above). It is particularly useful for prediction, and should be used very carefully with estimation runs. As usual, the block ﬁnishes with a line beginning with END. The block then should contain one line for each eﬀect. Each line should contain two entries :

(a): The name of the random eﬀect, as speciﬁed in the MODEL block.
This should be a genetic eﬀect, distributed proportionally to the numerator relationship matrix among animals.
(b): The ‘original’ code for the level to be treated as ’ﬁxed’, as given in the pedigree ﬁle.

N.B.: This option has only undergone fairly rudimentary testing !
It is not available in conjunction with the PX-EM algorithm.
Selecting this option prohibits re-use of inverse NRM matrices set up in any previous runs.

4.16 Covariance components to be treated as ﬁxed

WOMBAT provides limited facilities to ﬁx certain covariance components at their input values, while maximising the (log) likelihood with respect to the remaining parameters. Again, this information is given in a block entry. The block begins with a line containing the code FIXVAR (no abbreviation) and, as usual, ends with a line containing the code END. Each line within the block is then expected to have the following entries:

(a): The name of the random eﬀect, as speciﬁed in the MODEL block.
(b): The code ALL to signify that all pertaining covariances are ﬁxed.

In the moment, no options to ﬁx individual elements of covariance matrices are recognised.

4.17 Special information

A place to collect miscellaneous options which do not ﬁt into the categories above is provided through a block entry beginning with a line SPECIAL, and ending with a line END.

Options are available to address the following problems:

4.17.1 Weighted analysis

A weighted analysis is speciﬁed by a line with the following space separated entries:

(a)

The code WEIGHT at the beginning of the line.

(b)

The name of the column in the data containing the weight.

(c)

Optionally, for standard uni- and multivariate analyses, this can be followed by a code to select the type of weighting to apply:

none: is the default: each record is multiplied by the weight given prior to the analysis.
DESIGN: causes entries of “1” in the design matrices to be replaced with the weight speciﬁed (observations are not scaled).
RESID: multiplies the diagonal elements of the residual covariance matrix pertaining to the observation with the weight given; in addition, any non-zero elements for a pair of records (on the same individual) are scaled with the geometric mean of the corresponding pair of weights (i.e. ).

EXAMPLE: Consider an animal with two records and corresponding weights w₁ = 0.8 and w₂ = 2. Let the unweighted residual covariance matrix be
$( ) ( ) 10 −5 with inverse 0.1143 0.0286 −5 20 0.0286 0.0571$
The weighted matrix is then
$( 8 −6.325) ( 0.14286 0.02259) −6.325 40 with inverse 0.02259 0.02857$

4.17.2 Covariables that are zero

WOMBAT now checks for covariables which have a value of zero. If you ﬁt a covariable which has such values, it expects you to conﬁrm that this is intentional. This is done through a single line with space separated entries:

(a): The code COVZER at the beginning of the line.
(b): The full name of the covariable as given in the model statement (i.e. including brackets, number of coeﬃcients and, if applicable, type).
(c): A three letter option: FIT to conﬁrm that a value of zero for this covariable is a valid code, or IGN to skip such records (any other option will lead to a programmed error stop).

4.17.3 Animals that are clones

WOMBAT will stop if it encounters non-consecutive records for the same subject in diﬀerent parts of the data ﬁle, assuming that the data has not been sorted as required. However, there are situations where this is a ‘planned’ scenario, for example to ﬁt data on clones, where we want the same genetic eﬀect but diﬀerent residual eﬀects for members of a clone. Hence, this check can be switched oﬀ by including a line with the code CLONES in the SPECIAL block.

4.17.4 Standard errors for ﬁxed and random eﬀect solutions

WOMBAT reports solutions for ﬁxed and random eﬀects ﬁtted – these are calculated as a by-product by all algorithms to obtain REML estimates of variance components. However, standard deviations/errors are only reported for those algorithms which invert the coeﬃcient matrix in the mixed model equations. An option is provided to enforce this – this is invoked by a line (in a SPECIAL block) with the code FORCE-SE. If standard errors have been obtained automatically, this has no eﬀect. Otherwise, it forces calculation of the inverse of the coeﬃcient matrix at convergence.

N.B.: For large analyses, this can take quite some extra time.

4.17.5 GWAS from ssGBLUP: backsolving for marker eﬀects

For analyses incorporating genomic information in the relationship matrix for additive genetic eﬀects, WOMBAT provides the opportunity to ‘back solve’ from predicted breeding values for marker eﬀects and their standard error at the end of a BLUP or REML run, providing a GWAS scan; see Aguilar et al. [1]. This is invoked by a line (in a SPECIAL block) with the code BSOLVE-SNP rname, where rname is the name of the random eﬀect representing additive genetic eﬀects.

Details about additional information and input ﬁles needed and the output generated are available with Example 22.

4.17.6 Repeated records

WOMBAT attempts to check the data for records representing repeated measurements for a trait and individual, and stops when this is fairly low. This can be overridden to some extent: Analyses with repeated records for a trait for less than 20% and more than 10% of individuals can be enabled by a single line with the code REPEAT at the beginning of the line.

Standard multivariate analyses (MUV) with repeated records can represent a somewhat diﬃcult scenario, as proper modelling of the residual covariance structure relies on diﬀerentiating between records for diﬀerent traits are taken simultaneously and thus should be assumed to be subject to a common,temporary environmental (residual) covariance, and which should not. There are four strategies which can be chosen by specifying a single line with the following space separated entries:

(a)

The code RPTCOV at the beginning of the line.

(b)

A seven-letter qualiﬁer to select he strategy to be used. Valid terms are:

INDESCR

to ﬁt a residual covariance σ_{E ij} between all observations for traits i and j for an individual (this was the previous default).

ALIGNED

to ﬁt a residual covariance σ_{E ij} only between obervations with the same running running number, e.g. the ﬁrst record for trait i and the ﬁrst record for trait j (this is the current default).

ZEROALL

for the case where all temporary environmental covariances σ_{E ij} are assumed to be zero. This should be accompanied by corresponding starting values for the residual covariance components.

TSELECT

when none of the other options are appropriate and where the data ﬁle has a column with a ‘time of recording’ indicator (integer). The name of this column must be given on the same line, space-separated after TSELECT. This allows for proper identiﬁcation of records for diﬀerent traits taken at the same time – and thus assumed to have a non-zero, temporary environmental covariance – and those which are not when there is an arbitrary pattern of missing records.

EXAMPLE:

SPECIAL 
   RPTCOV TSELECT  mtime 
END

Speciﬁcation of this option is required for multivariate analyses combining traits with single and repeated records.

4.17.7 Fitting SNP eﬀects

When using the run option --snap, the covariable in the model representing SNP eﬀects needs to be identiﬁed by a line in a SPECIAL block containing

(a): The code SNPEFF at the beginning of the line.
(b): Space separated, the full name of the covariable, as speciﬁed in the MODEL block (i.e. including brackets and the order of it).
(c): Optional, for analyses estimating more than one regression coeﬃcient: Space separated, the code COVOUT to specify that the upper triangle of the covariance matrix of regression coeﬃcients is written out.

In addition, a COVZER statement may be required to ensure WOMBAT treats count of zero as valid covariable values.

For ‘house-keeping’ reasons, WOMBAT requires that these covariables are speciﬁed (in the MODEL block) before any other, ‘regular’ covariables to be ﬁtted.

EXAMPLE:

SPECIAL 
   SNPEFF  snp(1) 
   COVZER  snp(1)  FIT 
END

4.17.8 Sampling to approximate standard errors

The default in drawing samples from distribution of estimates of covariance matrices is to sample the parameters estimated, i.e. the elements of the Cholesky factor of the matrix of interest, and to construct the matrix samples from these. This yields samples within the parameter space. However, it can also yield substantial diﬀerences in the mean of samples and their variances from the ‘input’ values. Hence an option is provided to select sampling of the covariance matrices instead. This will yield means close to the REML estimates and estimates of sampling variances closer to those derived from the inverse of the average information matrix, but is likely to yield samples out of the parameter space.

To select a single covariance matrix to be sampled when approximating sampling variances (see subsection 5.2.9) requires a single line in a SPECIAL block with at least two entries:

(a): The code SAMPLEAI at the beginning of the line.
(b): The name of the random eﬀect as speciﬁed in the MODEL block (space separated).
(c): Optionally, this can be followed (space separated) by the code COVAR to select sampling of covariance matrices.
(d): Optionally again, COVAR can be followed (space separated) by the code TRUNC which invokes modiﬁcation of samples with eigenvalues close to zero (truncating at 10⁻⁴) to ensure the sample has the rank speciﬁed.

EXAMPLE:

SPECIAL 
   SAMPLEAI  animal 
END

4.17.9 Fitting “social” genetic eﬀects

To ﬁt a model with an additional random eﬀect representing individuals’ competitive genetic eﬀects, a random regression model analysis has to be selected – ﬁtting random eﬀects at an order of 1 with a basis function of “1” (option ONE) yields an equivalent model to a standard, univariate analysis whilst providing the ﬂexibility to model residual variances as heterogeneous. WOMBAT then requires a “social” group code to be supplied and collects the identities of the individuals in each group as part of its set-up steps.

Additional information required is to be supplied by a single line in the SPECIAL block. This should contain the following, space separated entries:

(a): The code SOCIAL at the start of the line.
(b): The name of the random eﬀect representing the “social” genetic eﬀect. This can be abbreviated by omitting the part in brackets specifying the order of ﬁt and basis function (e.g. (1,ONE)).
(c): The name of the ﬁxed or extra eﬀect representing the “social” group.
(d): An Integer variable giving the maximum group size.
(e): Optional: A Real variable giving the dilution factor d to be used. If omitted, a default value of d = 0 (i.e. no dilution) is used.
(f): Optional: If a non-zero dilution factor has been given, an additional, four-letter code is recognised which speciﬁes the “target group size” for which competitive genetic variance components are to be estimated. Codes available are NTWO which targets a group size of 2 (by using an entry of [1∕(n−1)]^d in the corresponding design matrix) and NBAR which targets the average group size (using coeﬃcients [(n−1)∕(n−1)]^d in the design matrix; see Bijma [4]). If omitted, the default used is NTWO.

For each run where a dilution factor has been speciﬁed, WOMBAT appends the value of the dilution factor used together with the corresponding maximum log likelihood to a ﬁle in the working directory named LogL4Quapprox.dat (see subsection 7.3.8). These represent points on the proﬁle likelihood curve for the dilution factor, and a quadratic approximation can be used to determine its maximum. The run time option --quapp is provided to assist in this task.

4.17.10 Fitting ‘explicit’ genetic groups

When unknown parent groups (UPG) are to be treated as random, an alternative to including them in the inverse of the relationship matrix among additive genetic eﬀects [see 47] is to ﬁt them explicitly, i.e. to ﬁt them as a separate random eﬀect.

y = Xb + Zu + ZQg + e

with y, b, u, g and e denoting the vectors of observations, ﬁxed eﬀects, random eﬀects, group eﬀects and residuals, respectively, and X and Z the corresponding design matrices. Q is the matrix linking individuals’ additive genetic eﬀects to groups, with each row of Q potentially containing multiple non-zero elements, tracing the proportional contribution of genetic groups through the pedigree, which sum to unity. WOMBAT allows for Q to be supplied by the user or can set up Q in core if group codes for missing sire and dam identities are given.

Instructing WOMBAT to ﬁt genetic groups thus requires these to be speciﬁed as a random eﬀect in the model of analysis, and the corresponding (co)variance components statements need to be provided. In addition, an entry in a SPECIAL block at the bottom of the parameter ﬁle is needed. This should consist of a single line, with the following, space separated entries:

(a)

The code GENGROUPS at the start of the line.

(b)

The name of the random eﬀect in the model representing genetic group eﬀects.

(c)

An integer variable giving the number of genetic groups to be ﬁtted (NB: This must be the actual number, not just some number at least as big).

(d)

A variable treated either as a scale factor or a special option:

When proportions of membership to genetic groups (i.e. elements of Q) are supplied, the variable is the assumed to be scale factor by which the proportions have been multiplied.
If the variable has a value of −9, it signiﬁes that the genetic group codes for sires and dams are to be read instead. These are used by WOMBAT to build the Q-matrix in core.

(e)

The name of the random eﬀect in the model representing individuals’ additive genetic eﬀect corresponding to the genetic groups.

For example,

EXAMPLE:

MODEL 
       ... 
# specify animal genetic effects first 
  RAN animal  GIN 
  RAN gggrps  IDE 
       ... 
END 
       ... 
SPECIAL 
# specify which random effect represents groups, 
# no. of levels and scale factor for proportions 
  GENGROUPS gggrps 27 1 animal 
       ... 
END

Depending on the covariance option for additive genetic eﬀects, information on Q or the UPG codes is expected to be read either from the corresponding pedigree ﬁle (NRM), starting with column 5 or from the *.codes ﬁle (GIN), beginning in column 4.

The elements of the rows of Q are read as a string of g space separated values (with g the number of groups as given in the SPECIAL block) for each individual. These values are assumed to have been obtained by multiplying the proportions with the scale factor speciﬁed.

For example, for g = 5 and a scale factor of 1000, the *.codes ﬁle might be:

EXAMPLE:

1 2301   1   900     0   100     0     0 
2 2302   1     0   900     0     0   100 
3 2303   2   100     0   800   100     0 
4 2304   2     0     0     0     0  1000 
            ...

with columns 1 and 2 the running number and original animal code, column 3 the type of animal (1 = non genotyped, 2 = genotyped; for compatibility, this code needs to be included for all run options if genetic groups are ﬁtted, including those that do not treat genotyped animals diﬀerently). Columns 4 to 8 then give the proportions of group membership, multiplied by the scale of 1000 for each of the 5 groups.

If it is chosen to supply UPG codes instead, these are expected to be given as columns 5 (sire) and 6 (dam) in the pedigree or columns 4 (sire) and 5 (dam) in the *.codes ﬁle. NB: For this option the genetic group codes need to be coded from 1 to the number of groups

4.17.11 In core storage

The default for WOMBAT is to repeatedly read data and relationship information from disk so as to minimize memory requirements. In-core storage of selected information can be speciﬁed with a line in a SPECIAL block. This should begin with the code INCORE followed (space separated) by one or more qualiﬁer(s) choosing which information is to be stored in core. Valid qualiﬁers are NRM (inverse of numerator relationship matrix), GIN (*.gin matrix), DATA (recoded data), RAW (raw data), ZRE (*.matrix or *.chlsky required or reduced rank analyses usng AI-REML) and ALL (all ﬁve).

EXAMPLE:

SPECIAL 
       ... 
   INCORE GIN DATA 
       ... 
END

4.17.12 Calculation of alternative outlier model statistics

Calculation of alternative outlier model (AOM) statistics for residuals or random eﬀects ﬁtted is invoked by specifying the option AOM-RES or AOM-RND by itself on a line in a SPECIAL block in the parameter ﬁle.

EXAMPLE:

SPECIAL 
       ... 
   AOM-RES 
       ... 
END

The AOM options are implemented for uni- and full rank standard multivariate analyses only; they are ignored for reduced rank (PC) analyses and random regression models. Calculations are carried out for one observation or random eﬀects level at a time. For large models or cases with dense coeﬃcient matrices C, this can take quite a long time.

AOM statistics for residual eﬀects, R⁻¹∕ ∘ ---------------------------
Dia(R− 1 − R− 1WC − 1W ′R −1) , are reported together with residuals in Residuals.dat. This ﬁle has one row per record with up to seven columns. Columns 1 to 3 are written out for all analyses and contain the residual, the predicted observation and the actual observation (for reference). Specifying AOM-RES adds the diagonal element of the projection matrix (p_ii; column 4), the respective element of R⁻¹ (the nominator of the AOM statistic; column 5) and the AOM statistic (column 6). For univariate analyses, the corresponding diagonal element of the “Hat” matrix (σ²[1 − σ²p_ii] with σ² the residual variance) is given in addition (column 7).

AOM statistics for random eﬀects, G⁻¹∕ ∘ ----------------------
Dia(G −1 − G −1CZZG − 1) , are reported alongside the corresponding solutions in the RnSoln_*.dat ﬁles. This ﬁle has up to 10 columns. If AOM-RND has been speciﬁed, the last three columns give the diagonal element of the covariance matrix (T_ii), the element of G⁻¹ ((G{-1}u)_i) and the AOM statistic, respectively. Where the denominator is essentially zero (e.g. for additive genetic eﬀects for parent without records), the AOM statistics is set to zero.

4.17.13 Analyses using a Gaussian Kernel matrix

Kernel based prediction using genomic information is attracting increasing interest. This can be parameterised as

To facilitate estimation of the bandwidth parameter, WOMBAT has been adapted to calculate the inverse of the kernel matrix for a chosen value of 𝜃, with d_ij the ’distance’ for individuals i and j.

To ﬁt a Gaussian kernel matrix as ‘relationship matrix’, the random eﬀect which this matrix pertains to needs to be speciﬁed with the covariance option GIN. The corresponding *.gin ﬁle should follow the same format as for other eﬀects with such covariance structure, except that the elements given should be the squared distances, d_ij² and that the diagonals of 0 can be omitted. Only rudimentary checks of the input values are performed, e.g. WOMBAT stops if a zero distance between a pair of individuals is found, as this will result in a non-positive deﬁnite matrix.

The bandwidth parameter to be used needs to be speciﬁed in a SPECIAL block. This should contain a line starting with the code KERNEL followed (space-separated) by the name of the random eﬀect as given in the MODEL block, and the value of 𝜃. E.g. for random eﬀect animal and 𝜃 = 1.5

EXAMPLE:

SPECIAL 
       ... 
   KERNEL animal   1.5 
       ... 
END

WOMBAT will then construct the kernel matrix for the given value of 𝜃 and attempt to invert it, as well as calculate its log determinant. K is assumed to be dense and is automatically stored ‘in-core’. Inversion is carried out using LAPACK [3] routine DPFTRF [18] which requires a positive deﬁnite matrix.

On completion of the estimation step for a given value of 𝜃, it is written to the ﬁle LogL4Quapprox.dat together with the corresponding maximum log likelihood. To determine the next value on the proﬁle likelihood for 𝜃 to be evaluated, WOMBAT can be run with the command line option --quapp (see subsection 5.2.12) which performs a quadratic approximation of the proﬁle log likelihood for 𝜃.

4.17.14 Correlations for random regression analyses

By default, WOMBAT calculates and writes out a small number of correlations on the ’original scale’ for the estimated covariance functions. These are primarily given to assist with checking your own calculations to obtain the correlations you are interested in - covariances and correlations are easily calculated using the ﬁle of evaluated basis functions provided.

The complete correlation matrix is not provided by default as this might result in big output ﬁles and many calculations needed. However, if you really want correlations for all values of the control value, you can specify this by a single line in a SPECIAL block containing the code RRCORR-ALL. If the analysis ﬁts a single control variable, this writes out not only RanRegCorrels.dat, but also writes an additional ﬁle RanRegCorrAll.dat (see subsection 7.2.4) with the same information in a format more convenient for further processing, e.g. plotting.

4.17.15 Genomic relationship matrices

WOMBAT has a special module for a pre-analysis step to carry out a number of calculations related to genomic relationship matrices and single step analyses.

Documentation on the various options available is given in a separate pdf ﬁle distributed with Example 20.

4.17.16 Penalized estimation

WOMBAT can carry out estimation imposing a penalty on the log likelihood function to improve the sampling properties of estimates. Three types of penalties are accommodated:

1.: A penalty regressing the so-called canonical eigenvalues towards their mean, as described by Meyer and Kirkpatrick [36], also referred to as ‘bending’. This can be done for the eigenvalues directly or after transformation to logarithmic scale. In addition, there is a choice of parameterisation.
As the canonical transformation involves only two matrices, this type of penalty is restricted to multivariate analyses ﬁtting a simple animal model or random regression analyses ﬁtting animals’ genetic and permanent environmental eﬀects only.
2.: A penalty proportional to the matrix divergence between a covariance matrix to be estimated and a speciﬁed target matrix. This will shrink the estimate towards the target. Options are available to shrink a covariance matrix or a corresponding correlation matrix. A particular choice of target is the phenotypic covariance or correlation matrix estimated in a standard (unpenalized) analysis [see 37] – hence WOMBAT provides the facility to write out such target matrices. While this could, in principle, be applied to multiple matrices, the current implementation is restricted to a single matrix.
3.: A penalty on a correlation matrix, shrinking the corresponding partial auto-correlations towards zero or their phenotypic counterparts. The penalty is derived assuming a (shifted) Beta distribution and its stringency is regulated through choice of the so-called prior eﬀective sample size.

Penalized estimation is invoked by one or more lines in a SPECIAL block. It generates several additional output ﬁles, described in subsection 7.2.11.

4.17.16.1 Selecting a penalty

Penalized estimation is speciﬁed by a line with space separated entries:

(a)

The code PENALTY at the beginning of the line.

(b)

A code specifying what type of penalty is to be applied. Values recognised are:

CANEIG: selects a penalty on the canonical eigenvalues.
COVARM: speciﬁes shrinkage of a covariance matrix.
CORREL: chooses shrinkage of a correlation matrix towards a given target correlation matrix.
KANEIG: selects a penalty on the canonical eigenvalues, obtained assuming a Beta distribution.
KORREL: selects a penalty on individual correlations, assumed to have a Beta distribution.
PACORR: chooses shrinkage of a the matrix of partial auto-correlations for a random eﬀect towards the corresponding phenotypic or an identity matrix.

The following variables depend on the form of penalty chosen.

For CANEIG:

(c)

A character variable to specify the ‘scale’ to be used. Options available are:

ORG: selects a penalty on the canonical eigenvalues on the original scale.
LOG: speciﬁes a penalty on the canonical eigenvalues after transformation to logarithmic scale.
LOGB: is similar to LOG but a penalty is applied to both log(λ_i) and log(1 − λ_i), where λ_i denotes the i−th canonical eigenvalue.

(d)

A character variable to specify the parametrisation of λ_i to be used. Options available are:

ORG: estimate λ_i directly.
LOG: estimate log(λ_i), i.e. λ_i transformed to logarithmic scale.
TIC: estimate log(λ_i∕(1 − λ_i)), i.e. λ_i transformed to logistic scale.

This variable is optional; if omitted, it is set to the same value as that given for the scale of the penalty (LOG for LOGB).

(e)

A real or integer variable specifying either the tuning factor ψ to be used, or, if estimates for multiple tuning factors are to be obtained, the number of ψ’s to be read subsequently or the number of diﬀerent ranges of ψ’s to be stepped through:

If a real value is given, WOMBAT interprets this as the (single) tuning factor to be used, and expects no further input.
N.B. This value must contain a decimal point to be recognised as such!
EXAMPLE:
```
SPECIAL 
   PENALTY  CANEIG  LOG 5.0 
END
```
speciﬁes a penalty on the canonical eigenvalues transformed to logarithmic scale for a tuning factor of ψ = 5.0.
If a positive integer is found, this is interpreted as the number of diﬀerent values for ψ to be used. These are to be read in a space-separated list, beginning on the next line.
EXAMPLE:
```
SPECIAL 
   PENALTY  CANEIG  ORG  6 
   0.5  1.0  2.5  5.2  8.7  11.0 
END
```
If a negative integer, −n is speciﬁed, this is interpreted as the number of ranges of ψ to be considered. These are expected to be found on the following n lines, with one triplet of space-separated real values per line, consisting of starting value, end value and step size.
EXAMPLE:
```
SPECIAL 
   PENALTY  CANEIG  LOG TIC -2 
   0.0  3.0  0.5 
   5.0  8.0  1.0 
END
```
Finally, a value of 0 (integer) speciﬁes that the values of ψ are to be read form a separate ﬁle. In this case, the name of this ﬁle needs to be given (space-separated from 0) on the same line. This ﬁle should give the tuning factors to be used, with each value given on a separate line.

All values of ψ given should be in ascending order.

For COVARM and CORREL:

(c)

An optional character variable:

PHENV

This option speciﬁes that the estimate of the phenotypic covariance or correlation matrix is to be written out to the ﬁle PenTargetMatrix.
It is best used in a preliminary step with a single ψ = 0 to generate the matrix to be used in subsequent penalized runs where we wish to shrink towards this matrix.
The default is for this ﬁle to exist and to be read.

EXAMPLE:

SPECIAL 
   PENALTY  COVARM PHENV  animal  0.0 
END

KLDIV

The default penalty on a matrix of size q × q is

with T the shrinkage target and C = (ψ + q + 1)∕ψ. The latter value originates from the assumption of an Inverse Wishart distribution for Σ. This option sets C = 1, i.e. applies a penalty proportional to the Kullback-Leibler divergence between Σ and T.

EXAMPLE:

SPECIAL 
   PENALTY  COVARM  KLDIV  animal 7.50 
END

(d)

The name of the random eﬀect covariance matrix to be penalized. This must match the name given in a VARIANCE statement (see section 4.11) earlier in the parameter ﬁle.

EXAMPLE:

SPECIAL 
   PENALTY  CORREL  animal -1 
   0.0 5.0 1.0 
END

(e)

Information on tuning parameter(s) to be used, as for CANEIG (see above).

For KANEIG: see example 19 For KORREL: see example 19 For PACORR:

(c)

An optional character variable:

PHENV: This speciﬁes that the shrinkage target is the phenotypic matrix of partial auto-correlations
IDENT: This speciﬁes that the shrinkage target is the identity matrix

If omitted, the default shrinkage target is the identity matrix.

(d)

The name of the random eﬀect covariance matrix to be penalized. This must match the name given in a VARIANCE statement (see section 4.11) earlier in the parameter ﬁle.

(e)

A real value specifying the ‘eﬀective sample size’ of the Beta prior (no multiple values allowed for this form of penalty).

EXAMPLE:

SPECIAL 
   PENALTY PACORR  animal  4.0 
END

4.17.16.2 Stopping after a given change in likelihood

If multiple tuning factors are given, it is sometimes desirable to stop once the deviation in the unpenalized log likelihood from the maximum (at ψ = 0) has exceeded a certain limit. This can be speciﬁed by an additional line with three space-separated entries:

(a)

The code PENALTY at the beginning of the line.

(b)

The code LIMIT.

(c)

A real variable with the value for the limit.

EXAMPLE:

SPECIAL 
   PENALTY  CORREL  KLDIV  animal -2 
   0.0  5.0   0.5 
   6.0  15.0  1.0 
   PENALTY  LIMIT  3.315 
END

For this option to work, the ﬁrst tuning factor given has to be ψ = 0!

HINT: Use run time option --–valid (subsection 5.3.2) for validation steps when using cross-validation to estimate the value for ψ to be used.

[next] [prev] [prev-tail] [front] [up]

Chapter 4The Parameter File

4.1 Overview

4.2 General rules

4.3 Run options

4.4 Comment line

4.5 Overriding default program limits

4.6 Analysis type

4.7 Pedigree ﬁle

4.8 Marker counts ﬁle

4.9 Data ﬁle

4.9.1 Simple

4.9.2 Compact

4.10 Model of analysis

4.10.1 Eﬀects ﬁtted

4.10.2 Fixed eﬀects

4.10.3 Fixed covariables

4.10.4 Random eﬀects

4.10.4.1 Random regressions

4.10.5 ‘Subject’ identiﬁer

4.10.6 ‘Extra’ eﬀects

4.10.7 Traits analysed

4.11 Covariance components

4.11.1 Residual covariances

4.11.1.1 ‘Standard’ multivariate analyses

4.11.1.2 Random regression analyses

4.11.2 Covariances for random eﬀects

4.12 Additional functions of covariance components

4.13 Pooling estimates of covariance components

4.14 Dependencies among ﬁxed eﬀects

4.15 Random eﬀects to be treated as ﬁxed

4.16 Covariance components to be treated as ﬁxed

4.17 Special information

4.17.1 Weighted analysis

4.17.2 Covariables that are zero

4.17.3 Animals that are clones

4.17.4 Standard errors for ﬁxed and random eﬀect solutions

4.17.5 GWAS from ssGBLUP: backsolving for marker eﬀects

4.17.6 Repeated records

4.17.7 Fitting SNP eﬀects

4.17.8 Sampling to approximate standard errors

4.17.9 Fitting “social” genetic eﬀects

4.17.10 Fitting ‘explicit’ genetic groups

4.17.11 In core storage

4.17.12 Calculation of alternative outlier model statistics

4.17.13 Analyses using a Gaussian Kernel matrix

4.17.14 Correlations for random regression analyses

4.17.15 Genomic relationship matrices

4.17.16 Penalized estimation

4.17.16.1 Selecting a penalty

4.17.16.2 Stopping after a given change in likelihood

Chapter 4
The Parameter File