All information on the the model of analysis and the data and pedigree files is specified through the parameter file.
The name of the parameter file should have extension ’.par’. By default,
WOMBAT expects to find a file wombat.par in the current working directory.
Other filenames can be specified at run time as the last command line option – see
chapter 5 for details.
Setting up the parameter file is straightforward, but care and attention to detail are required to get it ‘just right’. The worked examples give ‘templates’ for various types of analyses which are readily adaptable to other problems. Parsing of the lines in the parameter file is fairly elementary, hence it is important to adhere strictly to the format described in the following in painstaking detail. Note also that parsing relies on the parameter file being a ‘plain’ text file and not containing any ‘non-printable’ characters – e.g. do not use TAB characters instead of white spaces.
WOMBAT performs a limited number of consistency checks on the variables and models specified, but these are by no means exhaustive and should not be relied upon!
If WOMBAT does find an obvious error, it will stop with a message like “exit WOMBAT” or, in verbose mode, “Programmed(error) stop for WOMBAT encountered”. Inconsistencies or errors not discovered are likely to wreak havoc in subsequent steps !
HINT: Use the -v option at run time. This will cause WOMBAT to echo each line in the parameter file as it is read (to the screen) – if there is a programmed error stop at this stage, it is easier to find the mistake as you know which is the offending line.
The parameter file is read line by line. Each line is assumed to be 88 characters long, i.e. anything in columns 89 onwards is ignored.
Any line with a # in column 1 is considered to be a comment line and is skipped.
WOMBAT relies on specific codes at the beginning of each line (leading blank spaces are ignored) to distinguish between various types of information given. Valid codes are summarised in 4.1. Most codes can be abbreviated to 3 letters. The code and the information following it should be separated by space(s). Codes are not case sensitive, i.e. can be given in either upper or lower case letters.
Depending on the initial code, WOMBAT expects further information on the same or following lines. All variable and file names specified are treated as case sensitive. File names can be up to 30 characters long, and variable names can comprise up to 20 characters. All codes, names and variables must be separated by spaces.
The parameter file can have two different types of entries :
EXAMPLE:
PEDS pedigrees.dat
Here PEDS is the code for pedigree information, “pedigrees.dat” is the name of the file from which pedigree information is to be read.
EXAMPLE:
MODEL FIX CGroup RAN Animal NRM TRA Weight END MODELThis shows a block entry for the model of analysis, where the model fits CGroup as a crossclassified, fixed effect and Animal as random effect with covariance matrix proportional to the numerator relationship matrix, and Weight is the trait to be analysed.
Different entries should be specified in the order in which they are listed in the following sections.
Code | Within | Indicator for |
RUNOP | – | Line with run options [4.3 ] |
COMMENT | – | Comment line [4.4 ] |
OVERRIDE | – | Change “hard-coded”program limit [4.5 ] |
ANALYSIS | – | Code for analysis type [4.6 ] |
PEDS | – | Name of pedigree file [4.7 ] |
MRKFILE | – | Name of file with marker counts [4.8 ] |
DATA | – | Name of data file [4.9 ] |
TRNOS | DAT | Trait numbers (grouped input) [4.9.2 ] |
NAMES | DAT | Multiple column names (grouped input) [4.9.2 ] |
MODEL | – | Model of analysis [4.10 ] |
FIX | MOD | Name of fixed effect [4.10.2 ] |
COV | MOD | Name of fixed covariable [4.10.3 ] |
RAN | MOD | Name of random effect [4.10.4 ] |
RRC | MOD | Name of control variable [4.10.4.1 ] |
SUBJ | MOD | Name of variable identifying “subject” [4.10.5 ] |
EXT | MOD | Name extra variable [4.10.6 ] |
TRAIT | MOD | Trait name [4.10.7 ] |
RESIDUAL | – | Residual covariance matrix [4.11.1 ] |
ERROR | – | Residual covariance matrix [4.11.1 ] |
VARIANCE | – | Covariance matrix: random effects [4.11.2 ] |
SE+USR | Additional, user defined functions of covariances [4.12 ] | |
SUM | SE+ | Weighted sum |
VRA | SE+ | Variance ratio |
COR | SE+ | Correlation |
POOL | – | Options for pooling of covariance components [4.13 ] |
PSEUPED | POOL | Pseudo pedigree structure |
SMALL | POOL | Minimum eigenvalue |
ZEROUT | Fixed effects levels to be set to zero [4.14 ] | |
PSEUDOFX | Random effects levels to be treated as random [4.15 ] | |
FIXVAR | – | Fix covariance components [4.16 ] |
END |
Code | Within | Indicator for |
SPECIAL | – | Miscellaneous options [4.17 ] |
WEIGHT | SPECIAL | Weighted analysis [4.17.1 ] |
COVZER | SPECIAL | How to deal with ‘zero’ covariables [4.17.2 ] |
CLONES | SPECIAL | Ignore sorting of data file [4.17.3 ] |
FORCE-SE | SPECIAL | Force calculation of standard errors [4.17.4 ] |
BSOLVE-SNP | SPECIAL | Invoke ‘back-solving’ for SNP effects [4.17.5 ] |
REPEAT | SPECIAL | Run with low proportion of repeated records [4.17.6 ] |
RPTCOV | SPECIAL | Specify residual covariance structure [4.17.6 ] |
QTLEFF | SPECIAL | Specify which covariable is a QTL/SNP effect [4.17.7 ] |
SAMPLEAI | SPECIAL | Sampling to approximate standard errors [4.17.8 ] |
SOCIAL | SPECIAL | Analysis with ’social’ genetic effect [4.17.9 ] |
GENGROUPS | SPECIAL | Fit ‘explicit’ genetic groups [4.17.10 ] |
INCORE | SPECIAL | Analysis storing info in core [4.17.11 ] |
AOM-RES | SPECIAL | Calculate outlier statistics for residuals [4.17.12 ] |
KERNEL | SPECIAL | Gaussian kernel [4.17.13 ] |
RRCORR-ALL | SPECIAL | Alll correlations for RR analyses [4.17.14 ] |
HINVERSE | SPECIAL | Genomic and joint relationship matrices [4.17.15 ] |
PENALTY | SPECIAL, POOL | Invoke penalized estimation [4.17.16 ] |
While run time options (see chapter 5) are primarily intended to be read from the command line, WOMBAT also provides the facility to specify such options on the first line of the parameter file. To be recognised, this line must start with the code RUNOP and, after space(s), all options are expected to be given on the same line (space separated), in the same form as on the command line.
WOMBAT allows for a single, optional comment line (up to 74 characters) to be specified. This is specified by the code COMMENT (can be abbreviated to COM) at the beginning of the line. Anything following the spaces after COM(MENT) is treated as comment on the analysis. It is printed in some of the output files generated by WOMBAT to assist the user, and has no other use.
WOMBAT assigns most workspaces used dynamically. However, there are a number of dimensions which require an initial value and these have been assigned default values. Their settings can be inspected by running WOMBAT with run option --limits (see subsection 5.2.15). Occasionally, these defaults are too small or too large for the intended analysis or the hardware available. Hence an override mechanism is provided through a line in the parameter file consisting of the code OVERRIDE (can be abbreviated to OVER) at the beginning of the line followed (space separated) by the name of the variable to be changed and its new value. There should be one such line for each variable to be changed. Variable names recognised are:
These names can be abbreviated to the first six letters. Values for MAXANIS and MAXEQNS apply to REML analyses or BLUP runs with direct solution for the mixed model equations – the limits for BLUP analyses via iterative solutions are obtained by multiplying the respective values with a factor of 20.
N.B.: Remember to put any such lines close to the top of the parameter file – so that they are read prior to parts specifying the data layout or model of analysis.
This is a single line entry beginning with code ANALYSIS (can be abbreviated to ANA), followed by a two- or three-letter code describing the type of analysis. The following codes are recognised :
Except for UNI, this can be followed (after space(s)) by the code PC to select an analysis which fits the leading principal components only for some of the random effects fitted and yields reduced rank estimates of the corresponding covariance matrices. For MUV and MRR the number of traits in the analysis must be given as well – this should be the last entry of the line.
EXAMPLE:
ANALYSIS MUV PC 8
specifies a reduced rank, multivariate analysis for 8 traits
This is a single line entry beginning with code PEDS (can be abbreviated to PED), followed by the name of the pedigree file. There is no default name. This entry is ‘optional’, in such that it is only required if a code of NRM if specified for the covariance structure of a random effect (see subsection 4.10.4). Only one pedigree file can be given. The format of the pedigree file required by WOMBAT is described in detail in section 6.3.
By default, WOMBAT fits an animal model. If a sire model is to be fitted, the code SIREMODEL (can be abbreviated to SIR) should be given after the filename.
Additional options, given after the filename and, if applicable the sire model option, that are recognised are +INBR or +SEX. These instruct WOMBAT to read animals’ inbreeding coefficients or sex codes (1=X/2=XX) from the fourth of fifth column in the pedigree file. If given, +INBR causes WOMBAT to skip calculation of inbreeding coefficients and use the values supplied instead. Option +SEX is required if an NRM for X-linked genetic effects is to be set up.
This is a single line entry beginning with code MRKFILE (can be abbreviated to MRK), followed by the name of the file. There is no default name. This entry is ‘optional’, i.e. it is only required for selected analyses where the default name MarkerCounts.dat is to be replaced. The layout of the marker file required by WOMBAT is described in detail in section 6.4.
This is a block entry. The block begins with a line containing the code DATA (can be abbreviated to DAT) followed by the name of the data file. There is no default name. The general form of the data file required in described in section 6.2.
The following lines specify the record layout of the data file for all traits. There are two alternative ways of specification.
For each trait in turn, there should be one parameter file line for each column, up to
the last column used in the analysis.
The lines can have up to 3 elements :
The block is terminated by a line with the code END.
EXAMPLE:
DATA mydata.dat TR1 traitno 2 TR1 animal 1000 TR1 fixeffect 50 TR1 weight TR2 traitno 2 TR2 animal 500 TR2 fixeffect 30 TR2 feedintake END DATAThis shows the block for an analysis reading records for 2 traits from the file mydata.dat.
If there are several traits for which the record layout is the same, the respective
record layout can be given for the whole group of traits. This avoids tedious
duplication of lines.
This alternative is selected by placing the code GRP after the name of the data file
(same line, separated by a space).
For each group of traits, the following lines need to be given :
Again, the block is terminated by a line with the code END.
EXAMPLE:
DATA mydata.dat GRP TRNOS 1 2 traitno 2 animal 1000 fixeffect 50 NAMES weight feedintake END DATAThis shows the ‘grouped’ alternative for specifying the data file layout, for two traits with the same column structure in the example above.
This is another block entry. The block begins with a line containing the code MODEL (can be abbreviated to MOD), and finishes with a line beginning with END. The block then should contain one line for each effect to be fitted and one line for each trait in the analysis.
Each of the ‘effect’ lines comprises the following
Fixed effects can be cross-classified or nested fixed effects or covariables. The following codes are recognised :
FIX
This specifies a fixed effect in the model of analysis.
NB The name for a fixed effect should not contain a “(”, otherwise it is assumed that this effect is a covariable.
A simple, one-way interaction of two variables can be specified as vn1*vn2, with vn1 and vn2 valid variables names. [Not yet implemented !]
HINT: ‘Not implemented’ here means merely that WOMBAT will not code the interaction for you – you can, of course, still fit a model with an interaction effect, but you a) have to insert an additional column in the data file with the code for the appropriate subclass, b) fit this as if it were a crossclassified fixed effect, and c) specify any additional dependencies arising explicitly (using ZEROUT, see below).
COV
This specifies a fixed covariable. The effect name should have the form “vn(n,BAF)”, where vn is a variable name, the integer variable n gives the number of regression coefficients fitted and BAF stands for a three-letter code describing the basis functions to be used in the regression. NB: By default, intercepts for fixed covariables are not fitted.
Valid codes for basis functions are
N.B.: This yields a regression equation of the form
rather than an equation of form
This should be born in mind when interpreting any solutions for regression coefficients for POL covariables from WOMBAT - while there is a straightforward relationship between coefficients βi and bi, they are not interchangeable.
For example, n = 3 denotes a cubic polynomial, i.e. comprises a linear, quadratic and cubic regression coefficient, but no intercept (This differs from the implementation for random regressions where n = 4 denotes a cubic polynomial).
For analyses fitting spline functions, the degree of the spline is selected by specifying “L”, “Q” or “C” for linear, quadratic and cubic, respectively, immediately (no space) after the code BSP. Note that the default spline function is an equi-distant B-spline (i.e. the range of values of the covariable is divided into intervals of equal size), with the number of knots and intervals determined from the number of regression coefficients and the degree of the spline (k = n − d + 1 where k is the number of knots and d is the degree, d = 1 for “L”, d = 2 for “Q” and d = 3 for “C”) [32, section 2.2]. Other spline functions are readily fitted as user-defined basis functions.
Fitting of an intercept (in addition to deviation from means) can be enforced by preceding n with a minus sign – this is not recommended unless there are no other fixed effects in the model.
A covariable to be fitted as nested within a fixed effect is specified as “vn1*vn2(n,BAF)”, with vn1 the name of the fixed effect. If vn1 is not fitted as a fixed effect, it must be specified an an ‘extra’ effect (see below).
WOMBAT is fussy when it encounters a covariable which has a value of zero: Covariables which can take such value are only permitted if a SPECIAL option is given which confirms that these are valid codes and not ‘missing’ values; see subsection 4.17.2.
Random effects include the ‘control variables’, i.e. random covariables for random regression analyses. The following codes are recognised:
RAN
This code specifies a random effect. It should be followed (space separated) by
the name of the random effect. After the name, a three-letter code describing
the covariance structure of the effect can be given.
Valid codes for covariance structures are :
N.B.: Do not use this option for other effects - WOMBAT has no mechanism for checking that this option is appropriate.
For ’standard’ uni- and multivariate analyses, the random effect name is simply
the variable name as given for data file.
For random regression analyses, the variable name is augmented by information about the number of random regression coefficients for this effect and the basis functions used. It becomes “vn(n,BAF)”, analogous to the specification for covariables above. As above, n specifies the number of regression coefficients to be fitted. In contrast to fixed covariables, however, an intercept is always fitted. This implies that n gives the order, not the degree of fit. For instance, n = 3 in conjunction with a polynomial basis function specifies a quadratic polynomial with the 3 coefficients corresponding to the intercept, a linear and a quadratic term. WOMBAT allows for different control variables to be used to fit random regressions for different effect. If the model has more than one RRC statement (see below), the specification of random effects needs to be extended to tell WOMBAT which control variable to use for which effect, i.e. “vn(n,BAF,rrcn)” with rrcn the name of the control variable.
Valid codes for BAF are:
RRC
This code specifies a ‘control variable’ in a random regression analysis. It
should be followed (space separated) by the name of the variable, as given in
the DATA statement.
Optionally, immediately after the name (no spaces), the range of values of the
variable to be considered can be specified as (m − n), with m the lower and n
the upper limit.
N.B.: WOMBAT expects the value of the control variable to be non-negative (i.e. 0 or greater) and not to be a fraction (the control variable is read as a real variable but converted to the nearest integer, e.g. values of 3,.0, 3.1 and 3.4 would be treated as 3) – scale your values appropriately if necessary!
N.B.: For multivariate analyses, WOMBAT collects the range of values for the control variable (i.e. minimum and maximum) across all traits. This may be undesirable if different traits have distinct ranges and Legendre polynomials are used as basis function - if so, use the USR option and set up your own file which maps the range for each trait exactly as you want it!
For animal model analyses, WOMBAT assumes that the code for the first genetic effect fitted also identifies the subject on which measurements are taken. For some analyses (in particular those not fitting any additive genetic effects!) and sire models this is not appropriate, and such identifier needs to be supplied as an extra column in the data file.
SUBJ
This code needs to be followed by the name of the variable which identifies the individual.
For some models, coding is required for effects which are not explicitly fitted in the model of analysis, for instance, when fitting nested effects. These need to be specified as ‘extra’ effects.
EXT
This code, followed by the respective variable name, denotes an effect which is not fitted in the model but which is required to define other effects.
One line should be given for each trait. It should contain the following information :
HINT: All q traits in the analysis should be specified in this manner, even if k = m for some trait(s).
Next, the parameter file needs to specify values for all (co)variance components in the model of analysis. For variance component estimation, these are the starting values used. For simple BLUP analyses and simulation runs, these are the values assumed to be the population values.
The input matrices for full rank analyses must be positive definite, i.e. cannot have eigenvalues less than the operational zero. For reduced rank (PC) analyses, some ‘zero’ but no negative eigenvalues are acceptable, provided the rank (i.e. the number of non-zero eigenvalues) is equal to (or greater than) the number of principal components to be fitted.
The residual covariance matrix is specified by
Optionally, this can be followed (space separated) by a qualifier:
With qualifier: If DIAG is specified only the q diagonal elements (σ12, …, σq2) are read, and if NOSTART is given, no values are read.
Again, the residual covariance matrix is specified by
HOM
This code specifies homogeneous error covariances for all values of the control variable.
HET
This code specifies heterogeneous error covariances, with the covariance function a step function of the control variable. It should be followed (space separated) by an integer number giving the number of steps.
N.B.: ‘Step intervals’ must cover the complete range of values for the control variable encountered in the data.
EXAMPLE: For a univariate RR analysis with values of the control variable ranging from 1 to 5, say we want to fit separate residual variances for 1 & 2, 3 & 4 and 5:
VAR residual 1 HET 3 1 2 1.234 3 4 3.456 5 5 5.678
Similarly, for each covariance matrix due to random effects to be estimated, a ‘header’ line and the elements of the covariance matrix need to be specified.
Again, a qualifier DIAG or NOSTART, as described above for the residual covariance matrix (see subsubsection 4.11.1.1) is recognized.
As above, if DIAG is specified, only the q variances are read and with NOSTART no values are acquired.
In addition, users can define other functions of covariance components which should be calculated and for which sampling errors should be approximated. This is done in a block entry, beginning with a line containing the code SE+USR (can be abbreviated to SE+), and ending with a line beginning with END. The block should then contain one line for each function to be calculated. The content of the line depends on the type of function. Three types are recognised.
EXAMPLE:
SE+USR SUM siga+pe 1 2 VRA repeat 5 4 ENDConsider a univariate analysis with repeated records per individual. Fitting additive genetic and permanent environmental effects of the animal, variance components in the model are σA2, σ PE2 and σ E2 (residual), with running numbers 1, 2 and 3, respectively. WOMBAT automatically calculates the phenotypic variance σP2= σA2 + σ PE2 + σ E2 and gives it running number 4. To calculate the repeatability and approximate its sampling error, we first need to define the sum of σA2 and σ PE2 as a new covariance (which receives running number 5), and then define the repeatability as the ratio of this component and σP2.
HINT: Run WOMBAT with the --setup option to begin with. Inspect the file ListOfCovs generated in this step – this gives a list of running numbers for individual covariance components.
Information related to pooling of covariance components using a (penalized) maximum likelihood approach is specified in a block entry, beginning with a line containing the code POOL and ending with a line beginning with END. Within the block, the following directives are recognized:
PSEUPED
followed (space separated) by a three letter code specifying the assumed pedigree structure and values on sizes or numbers of families. The following pseudo-pedigree codes are available:
This option is suitable for a simple animal model only. WOMBAT will check that a) there is only one random effect fitted, and b) that this has covariance option NRM. If the MINPAR option (see below) is used, WOMBAT cannot perform these checks; hence the DIRADD code together with the name of the random effect, as described below, need to be given.
EXAMPLE:
MODEL RAN animal NRM RAN gmdam NRM RAN pedam IDE trait bwgt 1 trait wwgt 2 trait ywgt 3 END MOD VAR animal 3 NOS VAR gmdam 3 NOS VAR pedam 3 NOS VAR residual 3 NOS POOL PSEUPED BON 10 DIRADD animal MATADD gmdam MATPE pedam SMALL 0.001 DELTAL 0.005 END
EXAMPLE:
MODEL RAN animal NRM ... END MOD VAR animal 4 NOSTART VAR residual 4 NOSTART POOL PSEUPED USR 5 1.0 0.50 0.50 0.50 0.50 1.0 0.25 0.25 0.25 1.0 0.25 0.25 1.0 0.25 1.0 ENDThis shows the coefficients for direct additive genetic effects for a family comprising a sire (individual 1) with four progeny from unrelated dams.
SMALL
followed (space separated) by a real number which gives the lower limit (≤ 1) for smallest eigenvalue allowed in the combined matrices. If this is not specified, a default value of 0.0001 is used.
DELTAL
followed (space separated) by a real number specifying the convergence criterion: if the increase in log likelihood between iterates falls below this value, convergence is assumed to have been achieved. If not specified a stringent default of 0.00005 is used.
MINPAR
The default assumption for the parameter file used is that it is a ‘full’ parameter file as for a corresponding, multivariate analysis. However, the information on pedigree and data file and their layout are not actually required. Hence the option MINPAR (on a line by itself) is provided which allows use of a ‘minimum’ parameter file, switching off some of the consistency checks otherwise carried out. If used, MINPAR must be the first entry within the POOL block.
The minimum information to be given in the parameter file must comprise:
EXAMPLE:
ANAL MUV 14 VAR animal 14 NOSTART VAR residual 14 NOSTART POOL MINPAR SMALL 0.001d0 PSEUPED hfs 100 10 4 DIRADD animal END
SINGLE
The default form of input for results from part analysis is to read estimates from separate files, in the form of output generated by WOMBAT when carrying out multiple analyses of subsets of traits. Alternatively, all information can be given in a single file. This is selected by the option SINGLE, followed (space separated) by the name of the input file (same line). The layout required for this file is described in subsubsection 6.6.4.2.
PENALTY
followed (space separated) by a code word(s) defining the type and strength of penalty to be applied. Codes for the penalty type recognised are:
Either COVARM or CORREL can be followed (space separated) by the keyword MAKETAR. If this is given, WOMBAT determines the shrinkage target as the phenotypic covariance (or correlation) matrix obtained by summing estimates of covariances for all sources of variation from the preceding, unpenalized analysis. If this is not given, the upper triangle of the target matrix is expected to be read from a file with the standard name PenTargetMatrix; see subsubsection 6.6.7.1.
The last entry on the line relates to the tuning factor(s) to be used: If a single penalized analysis is to be carried out, the corresponding tuning factor should be given (real value). To specify multiple penalized analyses, specify the number of separate tuning factors multiplied by −1 as an integer value (e.g. -3 means 3 analyses), and list the corresponding tuning factors space separated on the next line.
EXAMPLE:
- 1.
- Shrink all correlation matrices towards the phenotypic correlation matrix using a single tuning factor of 0.1, and calculate the shrinkage target from unpenalized results.
PENALTY CORREL MAKETAR 0.1
- 2.
- Shrink canonical eigenvalues on the logarithm scale towards their mean, using 5 different tuning factors
PENALTY CANEIG LOG -5 0.01 0.1 0.5 1.0 2.0
WOMBAT requires a coefficient matrix in the mixed model equations which is of full rank. Hence, constraints must be placed on the fixed effects part of the model if more than one fixed effect is fitted. By default, the first level of each cross-classified fixed effect other than the first is ‘zeroed out’ for each trait to account for dependencies. If there are additional dependencies, these should be identified and specified explicitly prior to each analysis.
WOMBAT performs a simple least-squares analysis on the fixed effects part of the model, attempting to find such additional dependencies. However, this procedure should not be relied upon, in particular for large data sets where numerical errors tend to accumulate sufficiently to obscure identification. Dependencies not properly taken into account can lead to problems during estimation !
Additional effects to be zeroed out are specified in a block entry. The block begins with a line containing the code ZEROUT (can be abbreviated to ZER), and finishes with a line beginning with END. The block then should contain one line for each additional dependency. Each line should contain three entries :
In some instances, it is desirable to treat selected levels of a random, genetic effect as if it were ‘fixed’. A typical example is the analysis of dairy data under a sire model. If the data includes highly selected proven bulls which only have records from their second crop of daughters, we might want to treat these bulls as ‘fixed’ as we would expect estimates of the genetic variance to be biased downwards otherwise.
Treating selected random genetic effects levels as fixed is achieved by replacing the diagonal in the numerator relationship matrix by a value of zero when setting up the inverse of the numerator relationship matrix. If there is any pedigree information for the effect, this is ignored.
N.B.: Treating levels of random effect(s) as fixed may lead to additional dependencies in the fixed effects part of the model, especially for multivariate analyses. Great care must be taken to identify these and specify them explicitly, as outlined above (section 4.14). WOMBAT has no provision to account for this possibility !
Random genetic effects levels to treated as fixed are specified in a block entry. The block begins with a line containing the code PSEUDOFX (can be abbreviated to PSE). This can be followed (space separated) by a real number. If given, this value (which should be a small positive number, e.g. 0.0001 or 0.001) is used instead of the value of 0 when specifying the contributions to the inverse of the numerator relationship matrix. Effectively, this treats the respective effect level as “just a little bit random”. This option is provided as a mean to counteract additional dependencies in the model (see warning above). It is particularly useful for prediction, and should be used very carefully with estimation runs. As usual, the block finishes with a line beginning with END. The block then should contain one line for each effect. Each line should contain two entries :
N.B.: This option has only undergone fairly rudimentary testing !
It is not available in conjunction with the PX-EM algorithm.
Selecting this option prohibits re-use of inverse NRM matrices set up in any previous runs.
WOMBAT provides limited facilities to fix certain covariance components at their input values, while maximising the (log) likelihood with respect to the remaining parameters. Again, this information is given in a block entry. The block begins with a line containing the code FIXVAR (no abbreviation) and, as usual, ends with a line containing the code END. Each line within the block is then expected to have the following entries:
In the moment, no options to fix individual elements of covariance matrices are recognised.
A place to collect miscellaneous options which do not fit into the categories above is provided through a block entry beginning with a line SPECIAL, and ending with a line END.
Options are available to address the following problems:
A weighted analysis is specified by a line with the following space separated entries:
EXAMPLE: Consider an animal with two records and corresponding weights w1 = 0.8 and w2 = 2. Let the unweighted residual covariance matrix be
The weighted matrix is then
WOMBAT now checks for covariables which have a value of zero. If you fit a covariable which has such values, it expects you to confirm that this is intentional. This is done through a single line with space separated entries:
WOMBAT will stop if it encounters non-consecutive records for the same subject in different parts of the data file, assuming that the data has not been sorted as required. However, there are situations where this is a ‘planned’ scenario, for example to fit data on clones, where we want the same genetic effect but different residual effects for members of a clone. Hence, this check can be switched off by including a line with the code CLONES in the SPECIAL block.
WOMBAT reports solutions for fixed and random effects fitted – these are calculated as a by-product by all algorithms to obtain REML estimates of variance components. However, standard deviations/errors are only reported for those algorithms which invert the coefficient matrix in the mixed model equations. An option is provided to enforce this – this is invoked by a line (in a SPECIAL block) with the code FORCE-SE. If standard errors have been obtained automatically, this has no effect. Otherwise, it forces calculation of the inverse of the coefficient matrix at convergence.
N.B.: For large analyses, this can take quite some extra time.
For analyses incorporating genomic information in the relationship matrix for additive genetic effects, WOMBAT provides the opportunity to ‘back solve’ from predicted breeding values for marker effects and their standard error at the end of a BLUP or REML run, providing a GWAS scan; see Aguilar et al. [1]. This is invoked by a line (in a SPECIAL block) with the code BSOLVE-SNP rname, where rname is the name of the random effect representing additive genetic effects.
Details about additional information and input files needed and the output generated are available with Example 22.
WOMBAT attempts to check the data for records representing repeated measurements for a trait and individual, and stops when this is fairly low. This can be overridden to some extent: Analyses with repeated records for a trait for less than 20% and more than 10% of individuals can be enabled by a single line with the code REPEAT at the beginning of the line.
Standard multivariate analyses (MUV) with repeated records can represent a somewhat difficult scenario, as proper modelling of the residual covariance structure relies on differentiating between records for different traits are taken simultaneously and thus should be assumed to be subject to a common,temporary environmental (residual) covariance, and which should not. There are four strategies which can be chosen by specifying a single line with the following space separated entries:
EXAMPLE:
SPECIAL RPTCOV TSELECT mtime END
Specification of this option is required for multivariate analyses combining traits with single and repeated records.
When using the run option --snap, the covariable in the model representing SNP effects needs to be identified by a line in a SPECIAL block containing
In addition, a COVZER statement may be required to ensure WOMBAT treats count of zero as valid covariable values.
For ‘house-keeping’ reasons, WOMBAT requires that these covariables are specified (in the MODEL block) before any other, ‘regular’ covariables to be fitted.
EXAMPLE:
SPECIAL SNPEFF snp(1) COVZER snp(1) FIT END
The default in drawing samples from distribution of estimates of covariance matrices is to sample the parameters estimated, i.e. the elements of the Cholesky factor of the matrix of interest, and to construct the matrix samples from these. This yields samples within the parameter space. However, it can also yield substantial differences in the mean of samples and their variances from the ‘input’ values. Hence an option is provided to select sampling of the covariance matrices instead. This will yield means close to the REML estimates and estimates of sampling variances closer to those derived from the inverse of the average information matrix, but is likely to yield samples out of the parameter space.
To select a single covariance matrix to be sampled when approximating sampling variances (see subsection 5.2.9) requires a single line in a SPECIAL block with at least two entries:
EXAMPLE:
SPECIAL SAMPLEAI animal END
To fit a model with an additional random effect representing individuals’ competitive genetic effects, a random regression model analysis has to be selected – fitting random effects at an order of 1 with a basis function of “1” (option ONE) yields an equivalent model to a standard, univariate analysis whilst providing the flexibility to model residual variances as heterogeneous. WOMBAT then requires a “social” group code to be supplied and collects the identities of the individuals in each group as part of its set-up steps.
Additional information required is to be supplied by a single line in the SPECIAL block. This should contain the following, space separated entries:
For each run where a dilution factor has been specified, WOMBAT appends the value of the dilution factor used together with the corresponding maximum log likelihood to a file in the working directory named LogL4Quapprox.dat (see subsection 7.3.8). These represent points on the profile likelihood curve for the dilution factor, and a quadratic approximation can be used to determine its maximum. The run time option --quapp is provided to assist in this task.
When unknown parent groups (UPG) are to be treated as random, an alternative to including them in the inverse of the relationship matrix among additive genetic effects [see 48] is to fit them explicitly, i.e. to fit them as a separate random effect.
|
with y, b, u, g and e denoting the vectors of observations, fixed effects, random effects, group effects and residuals, respectively, and X and Z the corresponding design matrices. Q is the matrix linking individuals’ additive genetic effects to groups, with each row of Q potentially containing multiple non-zero elements, tracing the proportional contribution of genetic groups through the pedigree, which sum to unity. WOMBAT allows for Q to be supplied by the user or can set up Q in core if group codes for missing sire and dam identities are given.
Instructing WOMBAT to fit genetic groups thus requires these to be specified as a random effect in the model of analysis, and the corresponding (co)variance components statements need to be provided. In addition, an entry in a SPECIAL block at the bottom of the parameter file is needed. This should consist of a single line, with the following, space separated entries:
For example,
EXAMPLE:
MODEL ... # specify animal genetic effects first RAN animal GIN RAN gggrps IDE ... END ... SPECIAL # specify which random effect represents groups, # no. of levels and scale factor for proportions GENGROUPS gggrps 27 1 animal ... END
Depending on the covariance option for additive genetic effects, information on Q or the UPG codes is expected to be read either from the corresponding pedigree file (NRM), starting with column 5 or from the *.codes file (GIN), beginning in column 4.
The elements of the rows of Q are read as a string of g space separated values (with g the number of groups as given in the SPECIAL block) for each individual. These values are assumed to have been obtained by multiplying the proportions with the scale factor specified.
For example, for g = 5 and a scale factor of 1000, the *.codes file might be:
EXAMPLE:
1 2301 1 900 0 100 0 0 2 2302 1 0 900 0 0 100 3 2303 2 100 0 800 100 0 4 2304 2 0 0 0 0 1000 ...
with columns 1 and 2 the running number and original animal code, column 3 the type of animal (1 = non genotyped, 2 = genotyped; for compatibility, this code needs to be included for all run options if genetic groups are fitted, including those that do not treat genotyped animals differently). Columns 4 to 8 then give the proportions of group membership, multiplied by the scale of 1000 for each of the 5 groups.
If it is chosen to supply UPG codes instead, these are expected to be given as columns 5 (sire) and 6 (dam) in the pedigree or columns 4 (sire) and 5 (dam) in the *.codes file. NB: For this option the genetic group codes need to be coded from 1 to the number of groups
The default for WOMBAT is to repeatedly read data and relationship information from disk so as to minimize memory requirements. In-core storage of selected information can be specified with a line in a SPECIAL block. This should begin with the code INCORE followed (space separated) by one or more qualifier(s) choosing which information is to be stored in core. Valid qualifiers are NRM (inverse of numerator relationship matrix), GIN (*.gin matrix), DATA (recoded data), RAW (raw data), ZRE (*.matrix or *.chlsky required or reduced rank analyses usng AI-REML) and ALL (all five).
EXAMPLE:
SPECIAL ... INCORE GIN DATA ... END
Calculation of alternative outlier model (AOM) statistics for residuals or random effects fitted is invoked by specifying the option AOM-RES or AOM-RND by itself on a line in a SPECIAL block in the parameter file.
EXAMPLE:
SPECIAL ... AOM-RES ... END
The AOM options are implemented for uni- and full rank standard multivariate analyses only; they are ignored for reduced rank (PC) analyses and random regression models. Calculations are carried out for one observation or random effects level at a time. For large models or cases with dense coefficient matrices C, this can take quite a long time.
AOM statistics for residual effects, R−1e∕, are reported together with residuals in Residuals.dat. This file has one row per record with up to seven columns. Columns 1 to 3 are written out for all analyses and contain the residual, the predicted observation and the actual observation (for reference). Specifying AOM-RES adds the diagonal element of the projection matrix (pii; column 4), the respective element of R−1e (the nominator of the AOM statistic; column 5) and the AOM statistic (column 6). For univariate analyses, the corresponding diagonal element of the “Hat” matrix (σ2[1 − σ2pii] with σ2 the residual variance) is given in addition (column 7).
AOM statistics for random effects, G−1u∕, are reported alongside the corresponding solutions in the RnSoln_*.dat files. This file has up to 10 columns. If AOM-RND has been specified, the last three columns give the diagonal element of the covariance matrix (T_ii), the element of G−1u ((G{-1}u)_i) and the AOM statistic, respectively. Where the denominator is essentially zero (e.g. for additive genetic effects for parent without records), the AOM statistics is set to zero.
Kernel based prediction using genomic information is attracting increasing interest. This can be parameterised as
To facilitate estimation of the bandwidth parameter, WOMBAT has been adapted to calculate the inverse of the kernel matrix for a chosen value of 𝜃, with dij the ’distance’ for individuals i and j.
To fit a Gaussian kernel matrix as ‘relationship matrix’, the random effect which this matrix pertains to needs to be specified with the covariance option GIN. The corresponding *.gin file should follow the same format as for other effects with such covariance structure, except that the elements given should be the squared distances, dij2 and that the diagonals of 0 can be omitted. Only rudimentary checks of the input values are performed, e.g. WOMBAT stops if a zero distance between a pair of individuals is found, as this will result in a non-positive definite matrix.
The bandwidth parameter to be used needs to be specified in a SPECIAL block. This should contain a line starting with the code KERNEL followed (space-separated) by the name of the random effect as given in the MODEL block, and the value of 𝜃. E.g. for random effect animal and 𝜃 = 1.5
EXAMPLE:
SPECIAL ... KERNEL animal 1.5 ... END
WOMBAT will then construct the kernel matrix for the given value of 𝜃 and attempt to invert it, as well as calculate its log determinant. K is assumed to be dense and is automatically stored ‘in-core’. Inversion is carried out using LAPACK [3] routine DPFTRF [19] which requires a positive definite matrix.
On completion of the estimation step for a given value of 𝜃, it is written to the file LogL4Quapprox.dat together with the corresponding maximum log likelihood. To determine the next value on the profile likelihood for 𝜃 to be evaluated, WOMBAT can be run with the command line option --quapp (see subsection 5.2.12) which performs a quadratic approximation of the profile log likelihood for 𝜃.
By default, WOMBAT calculates and writes out a small number of correlations on the ’original scale’ for the estimated covariance functions. These are primarily given to assist with checking your own calculations to obtain the correlations you are interested in - covariances and correlations are easily calculated using the file of evaluated basis functions provided.
The complete correlation matrix is not provided by default as this might result in big output files and many calculations needed. However, if you really want correlations for all values of the control value, you can specify this by a single line in a SPECIAL block containing the code RRCORR-ALL. If the analysis fits a single control variable, this writes out not only RanRegCorrels.dat, but also writes an additional file RanRegCorrAll.dat (see subsection 7.2.4) with the same information in a format more convenient for further processing, e.g. plotting.
WOMBAT has a special module for a pre-analysis step to carry out a number of calculations related to genomic relationship matrices and single step analyses.
Documentation on the various options available is given in a separate pdf file distributed with Example 20.
WOMBAT can carry out estimation imposing a penalty on the log likelihood function to improve the sampling properties of estimates. Three types of penalties are accommodated:
As the canonical transformation involves only two matrices, this type of penalty is restricted to multivariate analyses fitting a simple animal model or random regression analyses fitting animals’ genetic and permanent environmental effects only.
Penalized estimation is invoked by one or more lines in a SPECIAL block. It generates several additional output files, described in subsection 7.2.11.
Penalized estimation is specified by a line with space separated entries:
The following variables depend on the form of penalty chosen.
For CANEIG:
This variable is optional; if omitted, it is set to the same value as that given for the scale of the penalty (LOG for LOGB).
EXAMPLE:
SPECIAL PENALTY CANEIG LOG 5.0 ENDspecifies a penalty on the canonical eigenvalues transformed to logarithmic scale for a tuning factor of ψ = 5.0.
EXAMPLE:
SPECIAL PENALTY CANEIG ORG 6 0.5 1.0 2.5 5.2 8.7 11.0 END
EXAMPLE:
SPECIAL PENALTY CANEIG LOG TIC -2 0.0 3.0 0.5 5.0 8.0 1.0 END
All values of ψ given should be in ascending order.
For COVARM and CORREL:
EXAMPLE:
SPECIAL PENALTY COVARM PHENV animal 0.0 END
with T the shrinkage target and C = (ψ + q + 1)∕ψ. The latter value originates from the assumption of an Inverse Wishart distribution for Σ. This option sets C = 1, i.e. applies a penalty proportional to the Kullback-Leibler divergence between Σ and T.
EXAMPLE:
SPECIAL PENALTY COVARM KLDIV animal 7.50 END
EXAMPLE:
SPECIAL PENALTY CORREL animal -1 0.0 5.0 1.0 END
For KANEIG: see example 19 For KORREL: see example 19 For PACORR:
If omitted, the default shrinkage target is the identity matrix.
EXAMPLE:
SPECIAL PENALTY PACORR animal 4.0 END
If multiple tuning factors are given, it is sometimes desirable to stop once the deviation in the unpenalized log likelihood from the maximum (at ψ = 0) has exceeded a certain limit. This can be specified by an additional line with three space-separated entries:
EXAMPLE:
SPECIAL PENALTY CORREL KLDIV animal -2 0.0 5.0 0.5 6.0 15.0 1.0 PENALTY LIMIT 3.315 END
For this option to work, the first tuning factor given has to be ψ = 0!
HINT: Use run time option --–valid (subsection 5.3.2) for validation steps when using cross-validation to estimate the value for ψ to be used.