4.8 Model of analysis

This is another block entry. The block begins with a line containing the code MODEL (can be abbreviated to MOD), and finishes with a line beginning with END. The block then should contain one line for each effect to be fitted and one line for each trait in the analysis.

4.8.1 Effects fitted

Each of the ‘effect’ lines comprises the following

(a)
a three-letter code for the type of effect,
(b)
the effect name, where the effect name is a combination of the variable name for a column in the data file and, if appropriate, some additional information.
No abbreviations for variable names are permitted, i.e. there must be an exact match with the names specified in the DATA block.
(c)
If the effect is to be fitted for a subset of traits only, the running numbers of these traits must be given (space separated).

4.8.1.1 Fixed effects

Fixed effects can be cross-classified or nested fixed effects or covariables. The following codes are recognised :
FIX This specifies a fixed effect in the model of analysis.

NB The name for a fixed effect should not contain a “(”, otherwise it is assumed that this effect is a covariable.

A simple, one-way interaction of two variables can be specified as vn1*vn2, with vn1 and vn2 valid variables names. [Not yet implemented !] 

HINT: ‘Not implemented’ here means merely that WOMBAT will not code the interaction for you – you can, of course, still fit a model with an interaction effect, but you a) have to insert an additional column in the data file with the code for the appropriate subclass, b) fit this as if it were a crossclassified fixed effect, and c) specify any additional dependencies arising explicitly (using ZEROUT, see below).


COV This specifies a fixed covariable. The effect name should have the form “vn(n  ,BAF)”, where vn is a variable name, the integer variable n  gives the number of regression coefficients fitted and BAF stands for a three-letter code describing the basis functions to be used in the regression. NB: By default, intercepts for fixed covariables are not fitted.

Valid codes for basis functions are
POL for ordinary polynomials.
This is the default and can be omitted, i.e “vn(n  )” is equivalent to “vn(n  ,POL)”. For instance, n = 2  denotes a quadratic regression. Note that WOMBAT deviates both records and covariables from their respective means prior to analysis.

N.B.: This yields a regression equation of the form

y− ¯y = b1(x− ¯x)+ b2(x − ¯x)2+ ⋅⋅⋅

rather than an equation of form

               2
y = β0+ β1x+ β2x + ⋅⋅⋅

This should be born in mind when interpreting any solutions for regression coefficients for POL covariables from WOMBAT - while there is a straightforward relationship between coefficients βi  and bi  , they are not interchangeable.


LEG for Legendre polynomials.

For example, n = 3  denotes a cubic polynomial, i.e. comprises a linear, quadratic and cubic regression coefficient, but no intercept (This differs from the implementation for random regressions where n = 4  denotes a cubic polynomial).
BSP for B-spline functions

For analyses fitting spline functions, the degree of the spline is selected by specifying “L”, “Q” or “C” for linear, quadratic and cubic, respectively, immediately (no space) after the code BSP. Note that the default spline function is an equi-distant B-spline (i.e. the range of values of the covariable is divided into intervals of equal size), with the number of knots and intervals determined from the number of regression coefficients and the degree of the spline (k = n− d+ 1  where k  is the number of knots and d  is the degree, d = 1  for “L”, d = 2  for “Q” and d = 3  for “C”) [19, section 2.2]. Other spline functions are readily fitted as user-defined basis functions.
USR for user defined functions

Fitting of an intercept (in addition to deviation from means) can be enforced by preceding n  with a minus sign – this is not recommended unless there are no other fixed effects in the model.

A covariable to be fitted as nested within a fixed effect is specified as “vn1*vn2(n  ,BAF)”, with vn1 the name of the fixed effect. If vn1 is not fitted as a fixed effect, it must be specified an an ‘extra’ effect (see below).

WOMBAT is fussy when it encounters a covariable which has a value of zero: Covariables which can take such value are only permitted if a SPECIAL option is given which confirms that these are valid codes and not ‘missing’ values; see 4.10.6.

4.8.1.2 Random effects

Random effects include the ‘control variables’, i.e. random covariables for random regression analyses. The following codes are recognised:
RAN This code specifies a random effect. It should be followed (space separated) by the name of the random effect. After the name, a three-letter code describing the covariance structure of the effect can be given.
Valid codes for covariance structures are :
NRM which denotes that the random effect is distributed proportionally to the numerator relationship matrix.
If this code is given, a pedigree file must be supplied.
SEX which denotes that the random effect is distributed proportionally to the numerator relationship matrix for X-linked genetic effects.  This inverse of this matrix is set up directly from the pedigree information, as described by Fernando and Grossman [6].
If this code is given, it is assumed that an autosomal genetic effects (option NRM with corresponding pedigree file) is also fitted and has already been specified. In addition, the pedigree file is expected to have an additional column specifying the number of X-chromosomes for each individual.
IDE which denotes that different levels of the random effect are uncorrelated. This is the default and can be omitted.
GIN which denotes that the random effect is distributed proportionally to an ‘arbitrary’ (relationship or correlation) matrix.
The user must supply the inverse of this matrix in the form outlined in 6.
PEQ which denotes a permanent environmental effect of the animal for data involving ‘repeated’ records, which is not to be fitted explicitly. Instead, an equivalent model is used, which accounts for the respective covariances as part of the residual covariance matrix. This is useful for the joint analysis of traits with single and repeated records.

N.B.: Do not use this option for other effects - WOMBAT has no mechanism for checking that this option is appropriate.


For ’standard’ uni- and multivariate analyses, the random effect name is simply the variable name as given for data file.

For random regression analyses, the variable name is augmented by information about the number of random regression coefficients for this effect and the basis functions used. It becomes “vn(n  ,BAF)”, analogous to the specification for covariables above. As above, n  specifies the number of regression coefficients to be fitted. In contrast to fixed covariables, however, an intercept is always fitted. This implies that n  gives the order, not the degree of fit. For instance, n = 3  in conjunction with a polynomial basis function specifies a quadratic polynomial with the 3 coefficients corresponding to the intercept, a linear and a quadratic term. WOMBAT allows for different control variables to be used to fit random regressions for different effect. If the model has more than one RRC statement (see below), the specification of random effects needs to be extended to tell WOMBAT which control variable to use for which effect, i.e. “vn(n  ,BAF,rrcn)” with rrcn the name of the control variable.

Valid codes for BAF are:
POL for ordinary polynomials.
This is the default and can be omitted, i.e “vn(n  )” is equivalent to “vn(n  ,POL)”.
LEG for Legendre polynomials.
BSP for B-spline functions
USR for user defined functions
IDE for an identity matrix, i.e. the i− th basis function has a single coefficient of “1” for the i− th coefficient with all other elements zero. This requires the number of RR coefficients to be equal to the number of levels of the control variable.
It is useful when fitting a multi-trait model as a RR model, e.g. to allow for heterogeneous residual variances.
ONE to assign a value of unity (“1”) to all coefficients.
This option has been introduced to facilitate a standard multivariate analysis with repeated records by fitting a random regression model to model an arbitrary pattern of temporary environmental covariances correctly.

RRC This code specifies a ‘control variable’ in a random regression analysis. It should be followed (space separated) by the name of the variable, as given in the DATA statement.
Optionally, immediately after the name (no spaces), the range of values of the variable to be considered can be specified as (m − n)  , with m  the lower and n  the upper limit.

N.B.: WOMBAT expects the value of the control variable to be non-negative (i.e. 0  or greater) and not to be a fraction (the control variable is read as a real variable but converted to the nearest integer, e.g. values of 3,.0, 3.1 and 3.4 would be treated as 3) – scale your values appropriately if necessary!

N.B.: For multivariate analyses, WOMBAT collects the range of values for the control variable (i.e. minimum and maximum) across all traits. This may be undesirable if different traits have distinct ranges and Legendre polynomials are used as basis function - if so, use the USR option and set up your own file which maps the range for each trait exactly as you want it!


4.8.1.3 ‘Subject’ identifier

For animal model analyses, WOMBAT assumes that the code for the first genetic effect fitted also identifies the subject on which measurements are taken. For some analyses (in particular those not fitting any additive genetic effects!) and sire models this is not appropriate, and such identifier needs to be supplied as an extra column in the data file.
SUBJ This code needs to be followed by the name of the variable which identifies the individual.

4.8.1.4 ‘Extra’ effects

For some models, coding is required for effects which are not explicitly fitted in the model of analysis, for instance, when fitting nested effects. These need to be specified as ‘extra’ effects.
EXT This code, followed by the respective variable name, denotes an effect which is not fitted in the model but which is required to define other effects.

4.8.2 Traits analysed

One line should be given for each trait. It should contain the following information :

(a)
The code TRAIT (can be abbreviated to TR).
(b)
The name of the trait, as specified in the DATA block.
(c)
The running number of the trait.
  • In most cases, this is simply the number from 1 to q  , where q  is the total number of traits in a multivariate analysis.
  • In addition, WOMBAT provides the opportunity to replace the trait number in the data file with a different number. This is useful, for instance, to carry out an analysis involving a subset of traits without the need to edit the data file. The syntax for this is : “k”->m. This specifies that value k  in the data file should be replaced with value m  for the analysis. If this is encountered, any records with trait numbers not selected in this fashion are ignored, i.e. this provides a mechanism for automatic subset selection.

    HINT: All q  traits in the analysis should be specified in this manner, even if k = m  for some trait(s).

(d)
Optional : A numeric value (integer) representing a ‘missing’ value - any records with the trait equal to this value are ignored in the analysis (default is − 123456789  ).1

1This may sound contradictory to 6.2 - but that comment pertains to multivariate analyses, where WOMBAT expects a separate record for each trait. The ‘missing value’ here merely represents a simple mechanism which allows selected records in the data file to be skipped.