All information on the the model of analysis and the data and pedigree ﬁles is speciﬁed through the parameter ﬁle.
The name of the parameter ﬁle should have extension ’.par’. By default,
WOMBAT expects to ﬁnd a ﬁle wombat.par in the current working directory.
Other ﬁlenames can be speciﬁed at run time as the last command line option – see
chapter 5 for details.
Setting up the parameter ﬁle is straightforward, but care and attention to detail are required to get it ‘just right’. The worked examples give ‘templates’ for various types of analyses which are readily adaptable to other problems. Parsing of the lines in the parameter ﬁle is fairly elementary, hence it is important to adhere strictly to the format described in the following in painstaking detail. Note also that parsing relies on the parameter ﬁle being a ‘plain’ text ﬁle and not containing any ‘nonprintable’ characters – e.g. do not use TAB characters instead of white spaces.
WOMBAT performs a limited number of consistency checks on the variables and models speciﬁed, but these are by no means exhaustive and should not be relied upon!
If WOMBAT does ﬁnd an obvious error, it will stop with a message like “exit WOMBAT” or, in verbose mode, “Programmed(error) stop for WOMBAT encountered”. Inconsistencies or errors not discovered are likely to wreak havoc in subsequent steps !
HINT: Use the v option at run time. This will cause WOMBAT to echo each line in the parameter ﬁle as it is read (to the screen) – if there is a programmed error stop at this stage, it is easier to ﬁnd the mistake as you know which is the oﬀending line.
The parameter ﬁle is read line by line. Each line is assumed to be 88 characters long, i.e. anything in columns 89 onwards is ignored.
Any line with a # in column 1 is considered to be a comment line and is skipped.
WOMBAT relies on speciﬁc codes at the beginning of each line (leading blank spaces are ignored) to distinguish between various types of information given. Valid codes are summarised in 4.1. Most codes can be abbreviated to 3 letters. The code and the information following it should be separated by space(s). Codes are not case sensitive, i.e. can be given in either upper or lower case letters.
Depending on the initial code, WOMBAT expects further information on the same or following lines. All variable and ﬁle names speciﬁed are treated as case sensitive. File names can be up to 30 characters long, and variable names can comprise up to 20 characters. All codes, names and variables must be separated by spaces.
The parameter ﬁle can have two diﬀerent types of entries :
EXAMPLE:
PEDS pedigrees.dat
Here PEDS is the code for pedigree information, “pedigrees.dat” is the name of the ﬁle from which pedigree information is to be read.
EXAMPLE:
MODEL FIX CGroup RAN Animal NRM TRA Weight END MODELThis shows a block entry for the model of analysis, where the model ﬁts CGroup as a crossclassiﬁed, ﬁxed eﬀect and Animal as random eﬀect with covariance matrix proportional to the numerator relationship matrix, and Weight is the trait to be analysed.
Diﬀerent entries should be speciﬁed in the order in which they are listed in the following sections.
Code  Within  Indicator for 
RUNOP  –  Line with run options [4.3 ] 
COMMENT  –  Comment line [4.4 ] 
OVERRIDE  –  Change “hardcoded”program limit [4.5 ] 
ANALYSIS  –  Code for analysis type [4.6 ] 
PEDS  –  Name of pedigree ﬁle [4.7 ] 
MRKFILE  –  Name of ﬁle with marker counts [4.8 ] 
DATA  –  Name of data ﬁle [4.9 ] 
TRNOS  DAT  Trait numbers (grouped input) [4.9.2 ] 
NAMES  DAT  Multiple column names (grouped input) [4.9.2 ] 
MODEL  –  Model of analysis [4.10 ] 
FIX  MOD  Name of ﬁxed eﬀect [4.10.2 ] 
COV  MOD  Name of ﬁxed covariable [4.10.3 ] 
RAN  MOD  Name of random eﬀect [4.10.4 ] 
RRC  MOD  Name of control variable [4.10.4.1 ] 
SUBJ  MOD  Name of variable identifying “subject” [4.10.5 ] 
EXT  MOD  Name extra variable [4.10.6 ] 
TRAIT  MOD  Trait name [4.10.7 ] 
RESIDUAL  –  Residual covariance matrix [4.11.1 ] 
ERROR  –  Residual covariance matrix [4.11.1 ] 
VARIANCE  –  Covariance matrix: random eﬀects [4.11.2 ] 
SE+USR  Additional, user deﬁned functions of covariances [4.12 ]  
SUM  SE+  Weighted sum 
VRA  SE+  Variance ratio 
COR  SE+  Correlation 
POOL  –  Options for pooling of covariance components [4.13 ] 
PSEUPED  POOL  Pseudo pedigree structure 
SMALL  POOL  Minimum eigenvalue 
ZEROUT  Fixed eﬀects levels to be set to zero [4.14 ]  
PSEUDOFX  Random eﬀects levels to be treated as random [4.15 ]  
FIXVAR  –  Fix covariance components [4.16 ] 
END 
Code  Within  Indicator for 
SPECIAL  –  Miscellaneous options [4.17 ] 
WEIGHT  SPECIAL  Weighted analysis [4.17.1 ] 
COVZER  SPECIAL  How to deal with ‘zero’ covariables [4.17.2 ] 
CLONES  SPECIAL  Ignore sorting of data ﬁle [4.17.3 ] 
FORCESE  SPECIAL  Force calculation of standard errors [4.17.4 ] 
BSOLVESNP  SPECIAL  Invoke ‘backsolving’ for SNP eﬀects [4.17.5 ] 
REPEAT  SPECIAL  Run with low proportion of repeated records [4.17.6 ] 
RPTCOV  SPECIAL  Specify residual covariance structure [4.17.6 ] 
QTLEFF  SPECIAL  Specify which covariable is a QTL/SNP eﬀect [4.17.7 ] 
SAMPLEAI  SPECIAL  Sampling to approximate standard errors [4.17.8 ] 
SOCIAL  SPECIAL  Analysis with ’social’ genetic eﬀect [4.17.9 ] 
GENGROUPS  SPECIAL  Fit ‘explicit’ genetic groups [4.17.10 ] 
INCORE  SPECIAL  Analysis storing info in core [4.17.11 ] 
AOMRES  SPECIAL  Calculate outlier statistics for residuals [4.17.12 ] 
KERNEL  SPECIAL  Gaussian kernel [4.17.13 ] 
RRCORRALL  SPECIAL  Alll correlations for RR analyses [4.17.14 ] 
HINVERSE  SPECIAL  Genomic and joint relationship matrices [4.17.15 ] 
PENALTY  SPECIAL, POOL  Invoke penalized estimation [4.17.16 ] 
While run time options (see chapter 5) are primarily intended to be read from the command line, WOMBAT also provides the facility to specify such options on the ﬁrst line of the parameter ﬁle. To be recognised, this line must start with the code RUNOP and, after space(s), all options are expected to be given on the same line (space separated), in the same form as on the command line.
WOMBAT allows for a single, optional comment line (up to 74 characters) to be speciﬁed. This is speciﬁed by the code COMMENT (can be abbreviated to COM) at the beginning of the line. Anything following the spaces after COM(MENT) is treated as comment on the analysis. It is printed in some of the output ﬁles generated by WOMBAT to assist the user, and has no other use.
WOMBAT assigns most workspaces used dynamically. However, there are a number of dimensions which require an initial value and these have been assigned default values. Their settings can be inspected by running WOMBAT with run option limits (see subsection 5.2.15). Occasionally, these defaults are too small or too large for the intended analysis or the hardware available. Hence an override mechanism is provided through a line in the parameter ﬁle consisting of the code OVERRIDE (can be abbreviated to OVER) at the beginning of the line followed (space separated) by the name of the variable to be changed and its new value. There should be one such line for each variable to be changed. Variable names recognised are:
These names can be abbreviated to the ﬁrst six letters. Values for MAXANIS and MAXEQNS apply to REML analyses or BLUP runs with direct solution for the mixed model equations – the limits for BLUP analyses via iterative solutions are obtained by multiplying the respective values with a factor of 20.
N.B.: Remember to put any such lines close to the top of the parameter ﬁle – so that they are read prior to parts specifying the data layout or model of analysis.
This is a single line entry beginning with code ANALYSIS (can be abbreviated to ANA), followed by a two or threeletter code describing the type of analysis. The following codes are recognised :
Except for UNI, this can be followed (after space(s)) by the code PC to select an analysis which ﬁts the leading principal components only for some of the random eﬀects ﬁtted and yields reduced rank estimates of the corresponding covariance matrices. For MUV and MRR the number of traits in the analysis must be given as well – this should be the last entry of the line.
EXAMPLE:
ANALYSIS MUV PC 8
speciﬁes a reduced rank, multivariate analysis for 8 traits
This is a single line entry beginning with code PEDS (can be abbreviated to PED), followed by the name of the pedigree ﬁle. There is no default name. This entry is ‘optional’, in such that it is only required if a code of NRM if speciﬁed for the covariance structure of a random eﬀect (see subsection 4.10.4). Only one pedigree ﬁle can be given. The format of the pedigree ﬁle required by WOMBAT is described in detail in section 6.3.
By default, WOMBAT ﬁts an animal model. If a sire model is to be ﬁtted, the code SIREMODEL (can be abbreviated to SIR) should be given after the ﬁlename.
Additional options, given after the ﬁlename and, if applicable the sire model option, that are recognised are +INBR or +SEX. These instruct WOMBAT to read animals’ inbreeding coeﬃcients or sex codes (1=X/2=XX) from the fourth of ﬁfth column in the pedigree ﬁle. If given, +INBR causes WOMBAT to skip calculation of inbreeding coeﬃcients and use the values supplied instead. Option +SEX is required if an NRM for Xlinked genetic eﬀects is to be set up.
This is a single line entry beginning with code MRKFILE (can be abbreviated to MRK), followed by the name of the ﬁle. There is no default name. This entry is ‘optional’, i.e. it is only required for selected analyses where the default name MarkerCounts.dat is to be replaced. The layout of the marker ﬁle required by WOMBAT is described in detail in section 6.4.
This is a block entry. The block begins with a line containing the code DATA (can be abbreviated to DAT) followed by the name of the data ﬁle. There is no default name. The general form of the data ﬁle required in described in section 6.2.
The following lines specify the record layout of the data ﬁle for all traits. There are two alternative ways of speciﬁcation.
For each trait in turn, there should be one parameter ﬁle line for each column, up to
the last column used in the analysis.
The lines can have up to 3 elements :
The block is terminated by a line with the code END.
EXAMPLE:
DATA mydata.dat TR1 traitno 2 TR1 animal 1000 TR1 fixeffect 50 TR1 weight TR2 traitno 2 TR2 animal 500 TR2 fixeffect 30 TR2 feedintake END DATAThis shows the block for an analysis reading records for 2 traits from the ﬁle mydata.dat.
If there are several traits for which the record layout is the same, the respective
record layout can be given for the whole group of traits. This avoids tedious
duplication of lines.
This alternative is selected by placing the code GRP after the name of the data ﬁle
(same line, separated by a space).
For each group of traits, the following lines need to be given :
Again, the block is terminated by a line with the code END.
EXAMPLE:
DATA mydata.dat GRP TRNOS 1 2 traitno 2 animal 1000 fixeffect 50 NAMES weight feedintake END DATAThis shows the ‘grouped’ alternative for specifying the data ﬁle layout, for two traits with the same column structure in the example above.
This is another block entry. The block begins with a line containing the code MODEL (can be abbreviated to MOD), and ﬁnishes with a line beginning with END. The block then should contain one line for each eﬀect to be ﬁtted and one line for each trait in the analysis.
Each of the ‘eﬀect’ lines comprises the following
Fixed eﬀects can be crossclassiﬁed or nested ﬁxed eﬀects or covariables. The following codes are recognised :
FIX
This speciﬁes a ﬁxed eﬀect in the model of analysis.
NB The name for a ﬁxed eﬀect should not contain a “(”, otherwise it is assumed that this eﬀect is a covariable.
A simple, oneway interaction of two variables can be speciﬁed as vn1*vn2, with vn1 and vn2 valid variables names. [Not yet implemented !]
HINT: ‘Not implemented’ here means merely that WOMBAT will not code the interaction for you – you can, of course, still ﬁt a model with an interaction eﬀect, but you a) have to insert an additional column in the data ﬁle with the code for the appropriate subclass, b) ﬁt this as if it were a crossclassiﬁed ﬁxed eﬀect, and c) specify any additional dependencies arising explicitly (using ZEROUT, see below).
COV
This speciﬁes a ﬁxed covariable. The eﬀect name should have the form “vn(n,BAF)”, where vn is a variable name, the integer variable n gives the number of regression coeﬃcients ﬁtted and BAF stands for a threeletter code describing the basis functions to be used in the regression. NB: By default, intercepts for ﬁxed covariables are not ﬁtted.
Valid codes for basis functions are
N.B.: This yields a regression equation of the form
rather than an equation of form
This should be born in mind when interpreting any solutions for regression coeﬃcients for POL covariables from WOMBAT  while there is a straightforward relationship between coeﬃcients β_{i} and b_{i}, they are not interchangeable.
For example, n = 3 denotes a cubic polynomial, i.e. comprises a linear, quadratic and cubic regression coeﬃcient, but no intercept (This diﬀers from the implementation for random regressions where n = 4 denotes a cubic polynomial).
For analyses ﬁtting spline functions, the degree of the spline is selected by specifying “L”, “Q” or “C” for linear, quadratic and cubic, respectively, immediately (no space) after the code BSP. Note that the default spline function is an equidistant Bspline (i.e. the range of values of the covariable is divided into intervals of equal size), with the number of knots and intervals determined from the number of regression coeﬃcients and the degree of the spline (k = n − d + 1 where k is the number of knots and d is the degree, d = 1 for “L”, d = 2 for “Q” and d = 3 for “C”) [32, section 2.2]. Other spline functions are readily ﬁtted as userdeﬁned basis functions.
Fitting of an intercept (in addition to deviation from means) can be enforced by preceding n with a minus sign – this is not recommended unless there are no other ﬁxed eﬀects in the model.
A covariable to be ﬁtted as nested within a ﬁxed eﬀect is speciﬁed as “vn1*vn2(n,BAF)”, with vn1 the name of the ﬁxed eﬀect. If vn1 is not ﬁtted as a ﬁxed eﬀect, it must be speciﬁed an an ‘extra’ eﬀect (see below).
WOMBAT is fussy when it encounters a covariable which has a value of zero: Covariables which can take such value are only permitted if a SPECIAL option is given which conﬁrms that these are valid codes and not ‘missing’ values; see subsection 4.17.2.
Random eﬀects include the ‘control variables’, i.e. random covariables for random regression analyses. The following codes are recognised:
RAN
This code speciﬁes a random eﬀect. It should be followed (space separated) by
the name of the random eﬀect. After the name, a threeletter code describing
the covariance structure of the eﬀect can be given.
Valid codes for covariance structures are :
N.B.: Do not use this option for other eﬀects  WOMBAT has no mechanism for checking that this option is appropriate.
For ’standard’ uni and multivariate analyses, the random eﬀect name is simply
the variable name as given for data ﬁle.
For random regression analyses, the variable name is augmented by information about the number of random regression coeﬃcients for this eﬀect and the basis functions used. It becomes “vn(n,BAF)”, analogous to the speciﬁcation for covariables above. As above, n speciﬁes the number of regression coeﬃcients to be ﬁtted. In contrast to ﬁxed covariables, however, an intercept is always ﬁtted. This implies that n gives the order, not the degree of ﬁt. For instance, n = 3 in conjunction with a polynomial basis function speciﬁes a quadratic polynomial with the 3 coeﬃcients corresponding to the intercept, a linear and a quadratic term. WOMBAT allows for diﬀerent control variables to be used to ﬁt random regressions for diﬀerent eﬀect. If the model has more than one RRC statement (see below), the speciﬁcation of random eﬀects needs to be extended to tell WOMBAT which control variable to use for which eﬀect, i.e. “vn(n,BAF,rrcn)” with rrcn the name of the control variable.
Valid codes for BAF are:
RRC
This code speciﬁes a ‘control variable’ in a random regression analysis. It
should be followed (space separated) by the name of the variable, as given in
the DATA statement.
Optionally, immediately after the name (no spaces), the range of values of the
variable to be considered can be speciﬁed as (m − n), with m the lower and n
the upper limit.
N.B.: WOMBAT expects the value of the control variable to be nonnegative (i.e. 0 or greater) and not to be a fraction (the control variable is read as a real variable but converted to the nearest integer, e.g. values of 3,.0, 3.1 and 3.4 would be treated as 3) – scale your values appropriately if necessary!
N.B.: For multivariate analyses, WOMBAT collects the range of values for the control variable (i.e. minimum and maximum) across all traits. This may be undesirable if diﬀerent traits have distinct ranges and Legendre polynomials are used as basis function  if so, use the USR option and set up your own ﬁle which maps the range for each trait exactly as you want it!
For animal model analyses, WOMBAT assumes that the code for the ﬁrst genetic eﬀect ﬁtted also identiﬁes the subject on which measurements are taken. For some analyses (in particular those not ﬁtting any additive genetic eﬀects!) and sire models this is not appropriate, and such identiﬁer needs to be supplied as an extra column in the data ﬁle.
SUBJ
This code needs to be followed by the name of the variable which identiﬁes the individual.
For some models, coding is required for eﬀects which are not explicitly ﬁtted in the model of analysis, for instance, when ﬁtting nested eﬀects. These need to be speciﬁed as ‘extra’ eﬀects.
EXT
This code, followed by the respective variable name, denotes an eﬀect which is not ﬁtted in the model but which is required to deﬁne other eﬀects.
One line should be given for each trait. It should contain the following information :
HINT: All q traits in the analysis should be speciﬁed in this manner, even if k = m for some trait(s).
Next, the parameter ﬁle needs to specify values for all (co)variance components in the model of analysis. For variance component estimation, these are the starting values used. For simple BLUP analyses and simulation runs, these are the values assumed to be the population values.
The input matrices for full rank analyses must be positive deﬁnite, i.e. cannot have eigenvalues less than the operational zero. For reduced rank (PC) analyses, some ‘zero’ but no negative eigenvalues are acceptable, provided the rank (i.e. the number of nonzero eigenvalues) is equal to (or greater than) the number of principal components to be ﬁtted.
The residual covariance matrix is speciﬁed by
Optionally, this can be followed (space separated) by a qualiﬁer:
With qualiﬁer: If DIAG is speciﬁed only the q diagonal elements (σ_{1}^{2}, …, σ_{q}^{2}) are read, and if NOSTART is given, no values are read.
Again, the residual covariance matrix is speciﬁed by
HOM
This code speciﬁes homogeneous error covariances for all values of the control variable.
HET
This code speciﬁes heterogeneous error covariances, with the covariance function a step function of the control variable. It should be followed (space separated) by an integer number giving the number of steps.
N.B.: ‘Step intervals’ must cover the complete range of values for the control variable encountered in the data.
EXAMPLE: For a univariate RR analysis with values of the control variable ranging from 1 to 5, say we want to ﬁt separate residual variances for 1 & 2, 3 & 4 and 5:
VAR residual 1 HET 3 1 2 1.234 3 4 3.456 5 5 5.678
Similarly, for each covariance matrix due to random eﬀects to be estimated, a ‘header’ line and the elements of the covariance matrix need to be speciﬁed.
Again, a qualiﬁer DIAG or NOSTART, as described above for the residual covariance matrix (see subsubsection 4.11.1.1) is recognized.
As above, if DIAG is speciﬁed, only the q variances are read and with NOSTART no values are acquired.
In addition, users can deﬁne other functions of covariance components which should be calculated and for which sampling errors should be approximated. This is done in a block entry, beginning with a line containing the code SE+USR (can be abbreviated to SE+), and ending with a line beginning with END. The block should then contain one line for each function to be calculated. The content of the line depends on the type of function. Three types are recognised.
EXAMPLE:
SE+USR SUM siga+pe 1 2 VRA repeat 5 4 ENDConsider a univariate analysis with repeated records per individual. Fitting additive genetic and permanent environmental eﬀects of the animal, variance components in the model are σ_{A}^{2}, σ_{ PE}^{2} and σ_{ E}^{2} (residual), with running numbers 1, 2 and 3, respectively. WOMBAT automatically calculates the phenotypic variance σ_{P}^{2}= σ_{A}^{2} + σ_{ PE}^{2} + σ_{ E}^{2} and gives it running number 4. To calculate the repeatability and approximate its sampling error, we ﬁrst need to deﬁne the sum of σ_{A}^{2} and σ_{ PE}^{2} as a new covariance (which receives running number 5), and then deﬁne the repeatability as the ratio of this component and σ_{P}^{2}.
HINT: Run WOMBAT with the setup option to begin with. Inspect the ﬁle ListOfCovs generated in this step – this gives a list of running numbers for individual covariance components.
Information related to pooling of covariance components using a (penalized) maximum likelihood approach is speciﬁed in a block entry, beginning with a line containing the code POOL and ending with a line beginning with END. Within the block, the following directives are recognized:
PSEUPED
followed (space separated) by a three letter code specifying the assumed pedigree structure and values on sizes or numbers of families. The following pseudopedigree codes are available:
This option is suitable for a simple animal model only. WOMBAT will check that a) there is only one random eﬀect ﬁtted, and b) that this has covariance option NRM. If the MINPAR option (see below) is used, WOMBAT cannot perform these checks; hence the DIRADD code together with the name of the random eﬀect, as described below, need to be given.
EXAMPLE:
MODEL RAN animal NRM RAN gmdam NRM RAN pedam IDE trait bwgt 1 trait wwgt 2 trait ywgt 3 END MOD VAR animal 3 NOS VAR gmdam 3 NOS VAR pedam 3 NOS VAR residual 3 NOS POOL PSEUPED BON 10 DIRADD animal MATADD gmdam MATPE pedam SMALL 0.001 DELTAL 0.005 END
EXAMPLE:
MODEL RAN animal NRM ... END MOD VAR animal 4 NOSTART VAR residual 4 NOSTART POOL PSEUPED USR 5 1.0 0.50 0.50 0.50 0.50 1.0 0.25 0.25 0.25 1.0 0.25 0.25 1.0 0.25 1.0 ENDThis shows the coeﬃcients for direct additive genetic eﬀects for a family comprising a sire (individual 1) with four progeny from unrelated dams.
SMALL
followed (space separated) by a real number which gives the lower limit (≤ 1) for smallest eigenvalue allowed in the combined matrices. If this is not speciﬁed, a default value of 0.0001 is used.
DELTAL
followed (space separated) by a real number specifying the convergence criterion: if the increase in log likelihood between iterates falls below this value, convergence is assumed to have been achieved. If not speciﬁed a stringent default of 0.00005 is used.
MINPAR
The default assumption for the parameter ﬁle used is that it is a ‘full’ parameter ﬁle as for a corresponding, multivariate analysis. However, the information on pedigree and data ﬁle and their layout are not actually required. Hence the option MINPAR (on a line by itself) is provided which allows use of a ‘minimum’ parameter ﬁle, switching oﬀ some of the consistency checks otherwise carried out. If used, MINPAR must be the ﬁrst entry within the POOL block.
The minimum information to be given in the parameter ﬁle must comprise:
EXAMPLE:
ANAL MUV 14 VAR animal 14 NOSTART VAR residual 14 NOSTART POOL MINPAR SMALL 0.001d0 PSEUPED hfs 100 10 4 DIRADD animal END
SINGLE
The default form of input for results from part analysis is to read estimates from separate ﬁles, in the form of output generated by WOMBAT when carrying out multiple analyses of subsets of traits. Alternatively, all information can be given in a single ﬁle. This is selected by the option SINGLE, followed (space separated) by the name of the input ﬁle (same line). The layout required for this ﬁle is described in subsubsection 6.6.4.2.
PENALTY
followed (space separated) by a code word(s) deﬁning the type and strength of penalty to be applied. Codes for the penalty type recognised are:
Either COVARM or CORREL can be followed (space separated) by the keyword MAKETAR. If this is given, WOMBAT determines the shrinkage target as the phenotypic covariance (or correlation) matrix obtained by summing estimates of covariances for all sources of variation from the preceding, unpenalized analysis. If this is not given, the upper triangle of the target matrix is expected to be read from a ﬁle with the standard name PenTargetMatrix; see subsubsection 6.6.7.1.
The last entry on the line relates to the tuning factor(s) to be used: If a single penalized analysis is to be carried out, the corresponding tuning factor should be given (real value). To specify multiple penalized analyses, specify the number of separate tuning factors multiplied by −1 as an integer value (e.g. 3 means 3 analyses), and list the corresponding tuning factors space separated on the next line.
EXAMPLE:
 1.
 Shrink all correlation matrices towards the phenotypic correlation matrix using a single tuning factor of 0.1, and calculate the shrinkage target from unpenalized results.
PENALTY CORREL MAKETAR 0.1
 2.
 Shrink canonical eigenvalues on the logarithm scale towards their mean, using 5 diﬀerent tuning factors
PENALTY CANEIG LOG 5 0.01 0.1 0.5 1.0 2.0
WOMBAT requires a coeﬃcient matrix in the mixed model equations which is of full rank. Hence, constraints must be placed on the ﬁxed eﬀects part of the model if more than one ﬁxed eﬀect is ﬁtted. By default, the ﬁrst level of each crossclassiﬁed ﬁxed eﬀect other than the ﬁrst is ‘zeroed out’ for each trait to account for dependencies. If there are additional dependencies, these should be identiﬁed and speciﬁed explicitly prior to each analysis.
WOMBAT performs a simple leastsquares analysis on the ﬁxed eﬀects part of the model, attempting to ﬁnd such additional dependencies. However, this procedure should not be relied upon, in particular for large data sets where numerical errors tend to accumulate suﬃciently to obscure identiﬁcation. Dependencies not properly taken into account can lead to problems during estimation !
Additional eﬀects to be zeroed out are speciﬁed in a block entry. The block begins with a line containing the code ZEROUT (can be abbreviated to ZER), and ﬁnishes with a line beginning with END. The block then should contain one line for each additional dependency. Each line should contain three entries :
In some instances, it is desirable to treat selected levels of a random, genetic eﬀect as if it were ‘ﬁxed’. A typical example is the analysis of dairy data under a sire model. If the data includes highly selected proven bulls which only have records from their second crop of daughters, we might want to treat these bulls as ‘ﬁxed’ as we would expect estimates of the genetic variance to be biased downwards otherwise.
Treating selected random genetic eﬀects levels as ﬁxed is achieved by replacing the diagonal in the numerator relationship matrix by a value of zero when setting up the inverse of the numerator relationship matrix. If there is any pedigree information for the eﬀect, this is ignored.
N.B.: Treating levels of random eﬀect(s) as ﬁxed may lead to additional dependencies in the ﬁxed eﬀects part of the model, especially for multivariate analyses. Great care must be taken to identify these and specify them explicitly, as outlined above (section 4.14). WOMBAT has no provision to account for this possibility !
Random genetic eﬀects levels to treated as ﬁxed are speciﬁed in a block entry. The block begins with a line containing the code PSEUDOFX (can be abbreviated to PSE). This can be followed (space separated) by a real number. If given, this value (which should be a small positive number, e.g. 0.0001 or 0.001) is used instead of the value of 0 when specifying the contributions to the inverse of the numerator relationship matrix. Eﬀectively, this treats the respective eﬀect level as “just a little bit random”. This option is provided as a mean to counteract additional dependencies in the model (see warning above). It is particularly useful for prediction, and should be used very carefully with estimation runs. As usual, the block ﬁnishes with a line beginning with END. The block then should contain one line for each eﬀect. Each line should contain two entries :
N.B.: This option has only undergone fairly rudimentary testing !
It is not available in conjunction with the PXEM algorithm.
Selecting this option prohibits reuse of inverse NRM matrices set up in any previous runs.
WOMBAT provides limited facilities to ﬁx certain covariance components at their input values, while maximising the (log) likelihood with respect to the remaining parameters. Again, this information is given in a block entry. The block begins with a line containing the code FIXVAR (no abbreviation) and, as usual, ends with a line containing the code END. Each line within the block is then expected to have the following entries:
In the moment, no options to ﬁx individual elements of covariance matrices are recognised.
A place to collect miscellaneous options which do not ﬁt into the categories above is provided through a block entry beginning with a line SPECIAL, and ending with a line END.
Options are available to address the following problems:
A weighted analysis is speciﬁed by a line with the following space separated entries:
EXAMPLE: Consider an animal with two records and corresponding weights w_{1} = 0.8 and w_{2} = 2. Let the unweighted residual covariance matrix be
The weighted matrix is then
WOMBAT now checks for covariables which have a value of zero. If you ﬁt a covariable which has such values, it expects you to conﬁrm that this is intentional. This is done through a single line with space separated entries:
WOMBAT will stop if it encounters nonconsecutive records for the same subject in diﬀerent parts of the data ﬁle, assuming that the data has not been sorted as required. However, there are situations where this is a ‘planned’ scenario, for example to ﬁt data on clones, where we want the same genetic eﬀect but diﬀerent residual eﬀects for members of a clone. Hence, this check can be switched oﬀ by including a line with the code CLONES in the SPECIAL block.
WOMBAT reports solutions for ﬁxed and random eﬀects ﬁtted – these are calculated as a byproduct by all algorithms to obtain REML estimates of variance components. However, standard deviations/errors are only reported for those algorithms which invert the coeﬃcient matrix in the mixed model equations. An option is provided to enforce this – this is invoked by a line (in a SPECIAL block) with the code FORCESE. If standard errors have been obtained automatically, this has no eﬀect. Otherwise, it forces calculation of the inverse of the coeﬃcient matrix at convergence.
N.B.: For large analyses, this can take quite some extra time.
For analyses incorporating genomic information in the relationship matrix for additive genetic eﬀects, WOMBAT provides the opportunity to ‘back solve’ from predicted breeding values for marker eﬀects and their standard error at the end of a BLUP or REML run, providing a GWAS scan; see Aguilar et al. [1]. This is invoked by a line (in a SPECIAL block) with the code BSOLVESNP rname, where rname is the name of the random eﬀect representing additive genetic eﬀects.
Details about additional information and input ﬁles needed and the output generated are available with Example 22.
WOMBAT attempts to check the data for records representing repeated measurements for a trait and individual, and stops when this is fairly low. This can be overridden to some extent: Analyses with repeated records for a trait for less than 20% and more than 10% of individuals can be enabled by a single line with the code REPEAT at the beginning of the line.
Standard multivariate analyses (MUV) with repeated records can represent a somewhat diﬃcult scenario, as proper modelling of the residual covariance structure relies on diﬀerentiating between records for diﬀerent traits are taken simultaneously and thus should be assumed to be subject to a common,temporary environmental (residual) covariance, and which should not. There are four strategies which can be chosen by specifying a single line with the following space separated entries:
EXAMPLE:
SPECIAL RPTCOV TSELECT mtime END
Speciﬁcation of this option is required for multivariate analyses combining traits with single and repeated records.
When using the run option snap, the covariable in the model representing SNP eﬀects needs to be identiﬁed by a line in a SPECIAL block containing
In addition, a COVZER statement may be required to ensure WOMBAT treats count of zero as valid covariable values.
For ‘housekeeping’ reasons, WOMBAT requires that these covariables are speciﬁed (in the MODEL block) before any other, ‘regular’ covariables to be ﬁtted.
EXAMPLE:
SPECIAL SNPEFF snp(1) COVZER snp(1) FIT END
The default in drawing samples from distribution of estimates of covariance matrices is to sample the parameters estimated, i.e. the elements of the Cholesky factor of the matrix of interest, and to construct the matrix samples from these. This yields samples within the parameter space. However, it can also yield substantial diﬀerences in the mean of samples and their variances from the ‘input’ values. Hence an option is provided to select sampling of the covariance matrices instead. This will yield means close to the REML estimates and estimates of sampling variances closer to those derived from the inverse of the average information matrix, but is likely to yield samples out of the parameter space.
To select a single covariance matrix to be sampled when approximating sampling variances (see subsection 5.2.9) requires a single line in a SPECIAL block with at least two entries:
EXAMPLE:
SPECIAL SAMPLEAI animal END
To ﬁt a model with an additional random eﬀect representing individuals’ competitive genetic eﬀects, a random regression model analysis has to be selected – ﬁtting random eﬀects at an order of 1 with a basis function of “1” (option ONE) yields an equivalent model to a standard, univariate analysis whilst providing the ﬂexibility to model residual variances as heterogeneous. WOMBAT then requires a “social” group code to be supplied and collects the identities of the individuals in each group as part of its setup steps.
Additional information required is to be supplied by a single line in the SPECIAL block. This should contain the following, space separated entries:
For each run where a dilution factor has been speciﬁed, WOMBAT appends the value of the dilution factor used together with the corresponding maximum log likelihood to a ﬁle in the working directory named LogL4Quapprox.dat (see subsection 7.3.8). These represent points on the proﬁle likelihood curve for the dilution factor, and a quadratic approximation can be used to determine its maximum. The run time option quapp is provided to assist in this task.
When unknown parent groups (UPG) are to be treated as random, an alternative to including them in the inverse of the relationship matrix among additive genetic eﬀects [see 48] is to ﬁt them explicitly, i.e. to ﬁt them as a separate random eﬀect.

with y, b, u, g and e denoting the vectors of observations, ﬁxed eﬀects, random eﬀects, group eﬀects and residuals, respectively, and X and Z the corresponding design matrices. Q is the matrix linking individuals’ additive genetic eﬀects to groups, with each row of Q potentially containing multiple nonzero elements, tracing the proportional contribution of genetic groups through the pedigree, which sum to unity. WOMBAT allows for Q to be supplied by the user or can set up Q in core if group codes for missing sire and dam identities are given.
Instructing WOMBAT to ﬁt genetic groups thus requires these to be speciﬁed as a random eﬀect in the model of analysis, and the corresponding (co)variance components statements need to be provided. In addition, an entry in a SPECIAL block at the bottom of the parameter ﬁle is needed. This should consist of a single line, with the following, space separated entries:
For example,
EXAMPLE:
MODEL ... # specify animal genetic effects first RAN animal GIN RAN gggrps IDE ... END ... SPECIAL # specify which random effect represents groups, # no. of levels and scale factor for proportions GENGROUPS gggrps 27 1 animal ... END
Depending on the covariance option for additive genetic eﬀects, information on Q or the UPG codes is expected to be read either from the corresponding pedigree ﬁle (NRM), starting with column 5 or from the *.codes ﬁle (GIN), beginning in column 4.
The elements of the rows of Q are read as a string of g space separated values (with g the number of groups as given in the SPECIAL block) for each individual. These values are assumed to have been obtained by multiplying the proportions with the scale factor speciﬁed.
For example, for g = 5 and a scale factor of 1000, the *.codes ﬁle might be:
EXAMPLE:
1 2301 1 900 0 100 0 0 2 2302 1 0 900 0 0 100 3 2303 2 100 0 800 100 0 4 2304 2 0 0 0 0 1000 ...
with columns 1 and 2 the running number and original animal code, column 3 the type of animal (1 = non genotyped, 2 = genotyped; for compatibility, this code needs to be included for all run options if genetic groups are ﬁtted, including those that do not treat genotyped animals diﬀerently). Columns 4 to 8 then give the proportions of group membership, multiplied by the scale of 1000 for each of the 5 groups.
If it is chosen to supply UPG codes instead, these are expected to be given as columns 5 (sire) and 6 (dam) in the pedigree or columns 4 (sire) and 5 (dam) in the *.codes ﬁle. NB: For this option the genetic group codes need to be coded from 1 to the number of groups
The default for WOMBAT is to repeatedly read data and relationship information from disk so as to minimize memory requirements. Incore storage of selected information can be speciﬁed with a line in a SPECIAL block. This should begin with the code INCORE followed (space separated) by one or more qualiﬁer(s) choosing which information is to be stored in core. Valid qualiﬁers are NRM (inverse of numerator relationship matrix), GIN (*.gin matrix), DATA (recoded data), RAW (raw data), ZRE (*.matrix or *.chlsky required or reduced rank analyses usng AIREML) and ALL (all ﬁve).
EXAMPLE:
SPECIAL ... INCORE GIN DATA ... END
Calculation of alternative outlier model (AOM) statistics for residuals or random eﬀects ﬁtted is invoked by specifying the option AOMRES or AOMRND by itself on a line in a SPECIAL block in the parameter ﬁle.
EXAMPLE:
SPECIAL ... AOMRES ... END
The AOM options are implemented for uni and full rank standard multivariate analyses only; they are ignored for reduced rank (PC) analyses and random regression models. Calculations are carried out for one observation or random eﬀects level at a time. For large models or cases with dense coeﬃcient matrices C, this can take quite a long time.
AOM statistics for residual eﬀects, R^{−1}e∕, are reported together with residuals in Residuals.dat. This ﬁle has one row per record with up to seven columns. Columns 1 to 3 are written out for all analyses and contain the residual, the predicted observation and the actual observation (for reference). Specifying AOMRES adds the diagonal element of the projection matrix (p_{ii}; column 4), the respective element of R^{−1}e (the nominator of the AOM statistic; column 5) and the AOM statistic (column 6). For univariate analyses, the corresponding diagonal element of the “Hat” matrix (σ^{2}[1 − σ^{2}p_{ii}] with σ^{2} the residual variance) is given in addition (column 7).
AOM statistics for random eﬀects, G^{−1}u∕, are reported alongside the corresponding solutions in the RnSoln_*.dat ﬁles. This ﬁle has up to 10 columns. If AOMRND has been speciﬁed, the last three columns give the diagonal element of the covariance matrix (T_ii), the element of G^{−1}u ((G{1}u)_i) and the AOM statistic, respectively. Where the denominator is essentially zero (e.g. for additive genetic eﬀects for parent without records), the AOM statistics is set to zero.
Kernel based prediction using genomic information is attracting increasing interest. This can be parameterised as
To facilitate estimation of the bandwidth parameter, WOMBAT has been adapted to calculate the inverse of the kernel matrix for a chosen value of 𝜃, with d_{ij} the ’distance’ for individuals i and j.
To ﬁt a Gaussian kernel matrix as ‘relationship matrix’, the random eﬀect which this matrix pertains to needs to be speciﬁed with the covariance option GIN. The corresponding *.gin ﬁle should follow the same format as for other eﬀects with such covariance structure, except that the elements given should be the squared distances, d_{ij}^{2} and that the diagonals of 0 can be omitted. Only rudimentary checks of the input values are performed, e.g. WOMBAT stops if a zero distance between a pair of individuals is found, as this will result in a nonpositive deﬁnite matrix.
The bandwidth parameter to be used needs to be speciﬁed in a SPECIAL block. This should contain a line starting with the code KERNEL followed (spaceseparated) by the name of the random eﬀect as given in the MODEL block, and the value of 𝜃. E.g. for random eﬀect animal and 𝜃 = 1.5
EXAMPLE:
SPECIAL ... KERNEL animal 1.5 ... END
WOMBAT will then construct the kernel matrix for the given value of 𝜃 and attempt to invert it, as well as calculate its log determinant. K is assumed to be dense and is automatically stored ‘incore’. Inversion is carried out using LAPACK [3] routine DPFTRF [19] which requires a positive deﬁnite matrix.
On completion of the estimation step for a given value of 𝜃, it is written to the ﬁle LogL4Quapprox.dat together with the corresponding maximum log likelihood. To determine the next value on the proﬁle likelihood for 𝜃 to be evaluated, WOMBAT can be run with the command line option quapp (see subsection 5.2.12) which performs a quadratic approximation of the proﬁle log likelihood for 𝜃.
By default, WOMBAT calculates and writes out a small number of correlations on the ’original scale’ for the estimated covariance functions. These are primarily given to assist with checking your own calculations to obtain the correlations you are interested in  covariances and correlations are easily calculated using the ﬁle of evaluated basis functions provided.
The complete correlation matrix is not provided by default as this might result in big output ﬁles and many calculations needed. However, if you really want correlations for all values of the control value, you can specify this by a single line in a SPECIAL block containing the code RRCORRALL. If the analysis ﬁts a single control variable, this writes out not only RanRegCorrels.dat, but also writes an additional ﬁle RanRegCorrAll.dat (see subsection 7.2.4) with the same information in a format more convenient for further processing, e.g. plotting.
WOMBAT has a special module for a preanalysis step to carry out a number of calculations related to genomic relationship matrices and single step analyses.
Documentation on the various options available is given in a separate pdf ﬁle distributed with Example 20.
WOMBAT can carry out estimation imposing a penalty on the log likelihood function to improve the sampling properties of estimates. Three types of penalties are accommodated:
As the canonical transformation involves only two matrices, this type of penalty is restricted to multivariate analyses ﬁtting a simple animal model or random regression analyses ﬁtting animals’ genetic and permanent environmental eﬀects only.
Penalized estimation is invoked by one or more lines in a SPECIAL block. It generates several additional output ﬁles, described in subsection 7.2.11.
Penalized estimation is speciﬁed by a line with space separated entries:
The following variables depend on the form of penalty chosen.
For CANEIG:
This variable is optional; if omitted, it is set to the same value as that given for the scale of the penalty (LOG for LOGB).
EXAMPLE:
SPECIAL PENALTY CANEIG LOG 5.0 ENDspeciﬁes a penalty on the canonical eigenvalues transformed to logarithmic scale for a tuning factor of ψ = 5.0.
EXAMPLE:
SPECIAL PENALTY CANEIG ORG 6 0.5 1.0 2.5 5.2 8.7 11.0 END
EXAMPLE:
SPECIAL PENALTY CANEIG LOG TIC 2 0.0 3.0 0.5 5.0 8.0 1.0 END
All values of ψ given should be in ascending order.
For COVARM and CORREL:
EXAMPLE:
SPECIAL PENALTY COVARM PHENV animal 0.0 END
with T the shrinkage target and C = (ψ + q + 1)∕ψ. The latter value originates from the assumption of an Inverse Wishart distribution for Σ. This option sets C = 1, i.e. applies a penalty proportional to the KullbackLeibler divergence between Σ and T.
EXAMPLE:
SPECIAL PENALTY COVARM KLDIV animal 7.50 END
EXAMPLE:
SPECIAL PENALTY CORREL animal 1 0.0 5.0 1.0 END
For KANEIG: see example 19 For KORREL: see example 19 For PACORR:
If omitted, the default shrinkage target is the identity matrix.
EXAMPLE:
SPECIAL PENALTY PACORR animal 4.0 END
If multiple tuning factors are given, it is sometimes desirable to stop once the deviation in the unpenalized log likelihood from the maximum (at ψ = 0) has exceeded a certain limit. This can be speciﬁed by an additional line with three spaceseparated entries:
EXAMPLE:
SPECIAL PENALTY CORREL KLDIV animal 2 0.0 5.0 0.5 6.0 15.0 1.0 PENALTY LIMIT 3.315 END
For this option to work, the ﬁrst tuning factor given has to be ψ = 0!
HINT: Use run time option –valid (subsection 5.3.2) for validation steps when using crossvalidation to estimate the value for ψ to be used.