4.10 Special information

Finally, the parameter file can select some special features.

4.10.1 Dependencies among fixed effects

WOMBAT requires a coefficient matrix in the mixed model equations which is of full rank. Hence, constraints must be placed on the fixed effects part of the model if more than one fixed effect is fitted. By default, the first level of each cross-classified fixed effect other than the first is ‘zeroed out’ for each trait to account for dependencies. If there are additional dependencies, these should be identified and specified explicitly prior to each analysis.

WOMBAT performs a simple least-squares analysis on the fixed effects part of the model, attempting to find such additional dependencies. However, this procedure should not be relied upon, in particular for large data sets where numerical errors tend to accumulate sufficiently to obscure identification. Dependencies not properly taken into account can lead to problems during estimation !

Additional effects to be zeroed out are specified in a block entry. The block begins with a line containing the code ZEROUT (can be abbreviated to ZER), and finishes with a line beginning with END. The block then should contain one line for each additional dependency. Each line should contain three entries :

(a)
The name of the fixed effect, as specified in the MODEL block.
(b)
The ‘original’ code for the level to be zeroed out, as encountered in the data file.
(c)
The trait number; this can be omitted for univariate analyses.

4.10.2 Random effects to be treated as fixed

In some instances, it is desirable to treat selected levels of a random, genetic effect as if it were ‘fixed’. A typical example is the analysis of dairy data under a sire model. If the data includes highly selected proven bulls which only have records from their second crop of daughters, we might want to treat these bulls as ‘fixed’ as we would expect estimates of the genetic variance to be biased downwards otherwise. In other cases, our pedigree records may include codes for ‘parents’ which are not animals but represent genetic groups, to be treated as fixed effects.

Treating selected random genetic effects levels as fixed is achieved by replacing the diagonal in the numerator relationship matrix by a value of zero when setting up the inverse of the numerator relationship matrix. If there is any pedigree information for the effect, this is ignored.

N.B.: Treating levels of random effect(s) as fixed may lead to additional dependencies in the fixed effects part of the model, especially for multivariate analyses. Great care must be taken to identify these and specify them explicitly, as outlined above (4.10.1). WOMBAT has no provision to account for this possibility !

Random genetic effects levels to treated as fixed are specified in a block entry. The block begins with a line containing the code PSEUDOFX (can be abbreviated to PSE). This can be followed (space separated) by a real number. If given, this value (which should be a small positive number, e.g. 0.0001  or 0.001  ) is used instead of the value of 0  when specifying the contributions to the inverse of the numerator relationship matrix. Effectively, this treats the respective effect level as “just a little bit random”. This option is provided as a mean to counteract additional dependencies in the model (see warning above). It is particularly useful for prediction, and should be used very carefully with estimation runs. As usual, the block finishes with a line beginning with END. The block then should contain one line for each effect. Each line should contain two entries :

(a)
The name of the random effect, as specified in the MODEL block.
This should be a genetic effect, distributed proportionally to the numerator relationship matrix among animals.
(b)
The ‘original’ code for the level to be treated as ’fixed’, as given in the pedigree file.

N.B.: This option has only undergone fairly rudimentary testing !
It is not available in conjunction with the PX-EM algorithm.
Selecting this option prohibits re-use of inverse NRM matrices set up in any previous runs.

4.10.3 Covariance components to be treated as fixed

WOMBAT provides limited facilities to fix certain covariance components at their input values, while maximising the (log) likelihood with respect to the remaining parameters. Again, this information is given in a block entry. The block begins with a line containing the code FIXVAR (no abbreviation) and, as usual, ends with a line containing the code END. Each line within the block is then expected to have the following entries:

(a)
The name of the random effect, as specified in the MODEL block.
(b)
The code ALL to signify that all pertaining covariances are fixed.

In the moment, no options to fix individual elements of covariance matrices are recognised.

4.10.4 Additional functions of covariance components

In addition, users can define other functions of covariance components which should be calculated and for which sampling errors should be approximated. This is done in a block entry, beginning with a line containing the code SE+USR (can be abbreviated to SE+), and ending with a line beginning with END. The block should then contain one line for each function to be calculated. The content of the line depends on the type of function. Three types are recognised

1.
Linear combinations (weighted sums) of the covariance components in the model of analysis. For these, space separated entries o should be
(a)
The code SUM at the beginning of the line.
(b)
A name to identify the sum to be calculated.
(c)
An entry of the form n(w)  for each component of the weighted sum, with n  the running number of the covariance component and w  the weight it is to be multiplied with. If w  is unity, it can be omitted, i.e. the entry n  is interpreted as n(1)  .
2.
Ratios of two covariance components. For these, the line should have three entries
(a)
The code VRA at the beginning of the line.
(b)
A name for the variance ratio to be calculated.
(c)
The running number of the covariance component in the numerator.
(d)
The running number of the covariance component in the denominator.
3.
Correlations, i.e. the ratio of a covariance component and the square root of the product of two other covariances. Line entries for these are
(a)
The code COR at the beginning of the line.
(b)
A name to identify the correlation to be calculated.
(c)
The running number of the covariance component in the numerator.
(d)
The running number of the first covariance component in the denominator.
(e)
The running number of the second covariance component in the denominator.

EXAMPLE:

SE+USR
  SUM  siga+pe  1  2
  VRA  repeat   5  4
END

Consider a univariate analysis with repeated records per individual. Fitting additive genetic and permanent environmental effects of the animal, variance components in the model are  2
σA  ,  2
σPE  and  2
σE  (residual), with running numbers 1, 2 and 3, respectively. WOMBAT automatically calculates the phenotypic variance σ2
 P  = σ2 + σ2 + σ2
 A    PE   E  and gives it running number 4. To calculate the repeatability and approximate its sampling error, we first need to define the sum of σ2A  and σ2PE  as a new covariance (which receives running number 5), and then define the repeatability as the ratio of this component and σ2P  .

HINT: Run WOMBAT with the --setup option to begin with. Inspect the file ListOfCovs generated in this step – this gives a list of running numbers for individual covariance components.

4.10.5 Pooling estimates of covariance components

Information related to pooling of covariance components using a (penalized) maximum likelihood approach is specified in a block entry, beginning with a line containing the code POOL and ending with a line beginning with END. Within the block, the following directives are recognized:
PSEUPED followed (space separated) by a three letter code specifying the assumed pedigree structure and values on sizes or numbers of families. The following pseudo-pedigree codes are available:
PHS denotes a simple balanced paternal half-sib design. Optionally, two integer numbers giving the numbers of sires and the number of progeny per sire, respectively, can be given (space separated, on the same line) after the code. If not given, default values of 10  and 4  are used.

This option is suitable for a simple animal model only. WOMBAT will check that a) there is only one random effect fitted, and b) that this has covariance option NRM. If the MINPAR option (see below) is used, WOMBAT cannot perform these checks; hence the DIRADD code together with the name of the random effect, as described below, need to be given.
HFS implies a balanced hierarchical full-sib design comprised of s  sires, d  dams per sire and n  progeny per dam. Assumed values for s  , d  and n  can be given (space-separated) on the same line. If omitted, default values of s = 10  , d = 5  and n = 4  are used. This option is suitable for a simple animal model or a model fitting maternal permanent environmental effects in addition. Again, if the MINPAR option is used, codes DIRADD and MATPE need to be given in addition.
BON selects a design comprising 8 individual per family in two generations, due to ? ]. Optionally, this code can be followed (space separated) by the number of such families (integer); if not given, a default value of 2 is used. For this design, expectations of covariances between relatives due to direct and maternal effects are available. In order for WOMBAT to ‘know’ which random effect has which structure, additional information is required. This should comprise one additional line per random effect fitted, with each line consisting of a keyword specifying the type of random effect followed (space separated) by the name of the effect as specified in the model of analysis. Keywords recognized are DIRADD for direct additive genetic, MATADD for maternal genetic and MATPE for maternal permanent environmental effects.

EXAMPLE:

MODEL
   RAN     animal  NRM
   RAN     gmdam   NRM
   RAN     pedam   IDE
   trait   bwgt    1
   trait   wwgt    2
   trait   ywgt    3
END MOD
VAR animal   3  NOS
VAR gmdam    3  NOS
VAR pedam    3  NOS
VAR residual 3  NOS
POOL
   PSEUPED    BON  10
      DIRADD  animal
      MATADD  gmdam
      MATPE   pedam
   SMALL      0.0010
   DELTAL     0.0010
END

USR Other designs and models are readily accommodated by specifying the expectations of covariances between family members for each random effect fitted (excluding residuals - this is set automatically to be an identity matrix). This is selected by the USR option which must be followed (space separated) by an integer specifying the family size. Optionally, a second number giving the number of families is recognized (a default value of 2 is used if not given). Then the upper triangle of the matrix of coefficients in the expectation of covariances has to be given for each random effect fitted. For multiple random effects, these have to be given in the same order in which they have been specified in the VAR statements in parameter file, and the matrix for each effect has to begin on a new line.

EXAMPLE:

MODEL
  RAN  animal  NRM
    ...
END MOD
VAR   animal   4  NOSTART
VAR   residual 4  NOSTART
POOL
   PSEUPED  USR  5
   1.0  0.50  0.50  0.50  0.50
   1.0  0.25  0.25  0.25
   1.0  0.25  0.25
   1.0  0.25
   1.0
END

This shows the coefficients for direct additive genetic effects for a family comprising a sire (individual 1) with four progeny from unrelated dams.



SMALL followed (space separated) by a real number which gives the lower limit (≤ 1  ) for smallest eigenvalue allowed in the combined matrices. If this is not specified, a default value of 0.0001  is used.
DELTAL followed (space separated) by a real number specifying the convergence criterion: if the increase in log likelihood between iterates falls below this value, convergence is assumed to have been achieved. If not specified a stringent default of 0.00005  is used.
MINPAR The default assumption for the parameter file used is that it is a ‘full’ parameter file as for a corresponding, multivariate analysis. However, the information on pedigree and data file and their layout are not actually required. Hence the option MINPAR (on a line by itself) is provided which allows use of a ‘minimum’ parameter file, switching off some of the consistency checks otherwise carried out. If used, MINPAR must be the first entry within the POOL block.

The minimum information to be given in the parameter file must comprise:

1.
The ANALysis statement
2.
A VAR line for each covariance matrix, together with the NOSTART option telling WOMBAT not to expect a matrix of starting values.
3.
The POOL block, including statements showing which random effect represents which type of genetic or non-genetic effect.

EXAMPLE:

ANAL MUV 14
VAR  animal    14  NOSTART
VAR  residual  14  NOSTART
POOL
   MINPAR
   SMALL  0.001d0
   PSEUPED  hfs  100  10  4
     DIRADD  animal
END

SINGLE The default form of input for results from part analysis is to read estimates from separate files, in the form of output generated by WOMBAT when carrying out multiple analyses of subsets of traits. Alternatively, all information can be given in a single file. This is selected by the option SINGLE, followed (space separated) by the name of the input file (same line). The layout required for this file is described in 6.5.4.2.
PENALTY followed (space separated) by a code word(s) defining the type and strength of penalty to be applied. Codes for the penalty type recognised are:
CANEIG selects a penalty on the canonical eigenvalues. This has to be followed (space separated) by either ORG or LOG specifying a penalty on eigenvalues on the original scale or transformed to logarithmic scale (no default).
COVARM specifies shrinkage of a covariance matrix towards a given target.
CORREL chooses shrinkage of a correlation matrix towards a given target correlation matrix.

Either COVARM or CORREL can be followed (space separated) by the keyword MAKETAR. If this is given, WOMBAT determines the shrinkage target as the phenotypic covariance (or correlation) matrix obtained by summing estimates of covariances for all sources of variation from the preceding, unpenalized analysis. If this is not given, the upper triangle of the target matrix is expected to be read from a file with the standard name PenTargetMatrix; see 6.5.7.1.

The last entry on the line relates to the tuning factor(s) to be used: If a single penalized analysis is to be carried out, the corresponding tuning factor should be given (real value). To specify multiple penalized analyses, specify the number of separate tuning factors multiplied by − 1  as an integer value (e.g. -3 means 3 analyses), and list the corresponding tuning factors space separated on the next line.

EXAMPLE:

1.
Shrink all correlation matrices towards the phenotypic correlation matrix using a single tuning factor of 0.1  , and calculate the shrinkage target from unpenalized results.
PENALTY CORREL  MAKETAR  0.1
2.
Shrink canonical eigenvalues on the logarithm scale towards their mean, using 5 different tuning factors
PENALTY CANEIG LOG  -5
0.01  0.1  0.5  1.0  2.0

4.10.6 Miscellaneous

A place to collect miscellaneous options which don’t fit into the categories above is provided through a block entry beginning with a line SPECIAL, and ending with a line END. Options are available to address the following problems:

4.10.6.1 Weighted analysis

A weighted analysis is specified by a line with the following space separated entries:

(a)
The code WEIGHT at the beginning of the line.
(b)
The name of the column in the data containing the weight.
(c)
Optionally, for standard uni- and multivariate analyses, this can be followed by a code to select the type of weighting to apply:
none is the default: each record is multiplied by the weight given prior to the analysis.
DESIGN causes entries of “1” in the design matrices to be replaced with the weight specified (observations are not scaled).
RESID multiplies the diagonal elements of the residual covariance matrix pertaining to the observation with the weight given; in addition, any non-zero elements for a pair of records (on the same individual) are scaled with the geometric mean of the corresponding pair of weights (i.e. √wiwj-  ).

EXAMPLE: Consider an animal with two records and corresponding weights w1 = 0.8  and w2 = 2  . Let the unweighted residual covariance matrix be

(  10  − 5)                  ( 0.1143  0.0286 )
  −5   20      with inverse      0.0286  0.0571
The weighted matrix is then
(     8  − 6.325 )               ( 0.14286  0.02259 )
  −6.325      40      with inverse    0.02259  0.02857

4.10.6.2 Covariables that are zero

WOMBAT now checks for covariables which have a value of zero. If you fit a covariable which has such values, it expects you to confirm that this is intentional. This is done through a single line with space separated entries:

(a)
The code COVZER at the beginning of the line.
(b)
The full name of the covariable as given in the model statement (i.e. including brackets, number of coefficients and, if applicable, type).
(c)
A three letter option: FIT to confirm that a value of zero for this covariable is a valid code, or IGN to skip such records (any other option will lead to a programmed error stop).

4.10.6.3 Animals that are clones

WOMBAT will stop if it encounters non-consecutive records for the same subject in different parts of the data file, assuming that the data has not been sorted as required. However, there are situations where this is a ‘planned’ scenario, for example to fit data on clones, where we want the same genetic effect but different residual effects for members of a clone. Hence, this check can be switched off by including a line with the code CLONES in the SPECIAL block.

4.10.6.4 Standard errors for fixed and random effect solutions

WOMBAT reports solutions for fixed and random effects fitted – these are calculated as a by-product by all algorithms to obtain REML estimates of variance components. However, standard deviations/errors are only reported for those algorithms which invert the coefficient matrix in the mixed model equations. An option is provided to enforce this – this is invoked by a line (in a SPECIAL block) with the code FORCE-SE. If standard errors have been obtained automatically, this has no effect. Otherwise, it forces calculation of the inverse of the coefficient matrix at convergence.

N.B.: For large analyses, this can take quite some extra time.

4.10.6.5 Repeated records

WOMBAT attempts to check the data for records representing repeated measurements for a trait and individual, and stops when this is fairly low. This can be overridden to some extent: Analyses with repeated records for a trait for less than 20% and more than 10% of individuals can be enabled by a single line with the code REPEAT at the beginning of the line.

Standard multivariate analyses (MUV) with repeated records can represent a somewhat difficult scenario, as proper modelling of the residual covariance structure relies on differentiating between records for different traits are taken simultaneously and thus should be assumed to be subject to a common,temporary environmental (residual) covariance, and which should not. There are four strategies which can be chosen by specifying a single line with the following space separated entries:

(a)
The code RPTCOV at the beginning of the line.
(b)
A seven-letter qualifier to select he strategy to be used. Valid terms are:
INDESCR to fit a residual covariance σEij  between all observations for traits i  and j  for an individual (this was the previous default).
ALIGNED to fit a residual covariance σ
 Eij  only between obervations with the same running running number, e.g. the first record for trait i  and the first record for trait j  (this is the current default).
ZEROALL for the case where all temporary environmental covariances σEij  are assumed to be zero. This should be accompanied by corresponding starting values for the residual covariance components.
TSELECT when none of the other options are appropriate and where the data file has a column with a ‘time of recording’ indicator (integer). The name of this column must be given on the same line, space-separated after TSELECT. This allows for proper identification of records for different traits taken at the same time – and thus assumed to have a non-zero, temporary environmental covariance – and those which are not when there is an arbitrary pattern of missing records.

EXAMPLE:

SPECIAL
   RPTCOV TSELECT  mtime
END

Specification of this option is required for multivariate analyses combining traits with single and repeated records.

4.10.6.6 Fitting SNP effects

When using the run option --snap, the covariable in the model representing SNP effects needs to be identified by a line in a SPECIAL block containing

(a)
The code QTLEFF at the beginning of the line.
(b)
Space separated, the full name of the covariable, as specified in the MODEL block (i.e. including brackets and the order of it).
(c)
Optional, for analyses estimating more than one regression coefficient: Space separated, the code COVOUT to specify that the upper triangle of the covariance matrix of regression coefficients is written out.

In addition, a COVZER statement may be required to ensure WOMBAT treats count of zero as valid covariable values.

For ‘house-keeping’ reasons, WOMBAT requires that these covariables are specified (in the MODEL block) before any other, ‘regular’ covariables to be fitted.

EXAMPLE:

SPECIAL
   QTLEFF  snp(1)
   COVZER  snp(1)  FIT
END

4.10.6.7 Sampling to approximate standard errors

The default in drawing samples from distribution of estimates of covariance matrices is to sample the parameters estimated, i.e. the elements of the Cholesky factor of the matrix of interest, and to construct the matrix samples from these. This yields samples within the parameter space. However, it can also yield substantial differences in the mean of samples and their variances from the ‘input’ values. Hence an option is provided to select sampling of the covariance matrices instead. This will yield means close to the REML estimates and estimates of sampling variances closer to those derived from the inverse of the average information matrix, but is likely to yield samples out of the parameter space.

To select a single covariance matrix to be sampled when approximating sampling variances (see 5.2.9) requires a single line in a SPECIAL block with at least two entries:

(a)
The code SAMPLEAI at the beginning of the line.
(b)
The name of the random effect as specified in the MODEL block (space separated).
(c)
Optionally, this can be followed (space separated) by the code COVAR to select sampling of covariance matrices.
(d)
Optionally again, COVAR can be followed (space separated) by the code TRUNC which invokes modification of samples with eigenvalues close to zero (truncating at 10−4  ) to ensure the sample has the rank specified.

EXAMPLE:

SPECIAL
   SAMPLEAI  animal
END

4.10.6.8 Fitting “social” genetic effects

To fit a model with an additional random effect representing individuals’ competitive genetic effects, a random regression model analysis has to be selected – fitting random effects at an order of 1 with a basis function of “1” (option ONE) yields an equivalent model to a standard, univariate analysis whilst providing the flexibility to model residual variances as heterogeneous. WOMBAT then requires a “social” group code to be supplied and collects the identities of the individuals in each group as part of its set-up steps.

Additional information required is to be supplied by a single line in the SPECIAL block. This should contain the following, space separated entries:

(a)
The code SOCIAL at the start of the line.
(b)
The name of the random effect representing the “social” genetic effect. This can be abbreviated by omitting the part in brackets specifying the order of fit and basis function (e.g. (1,ONE)).
(c)
The name of the fixed or extra effect representing the “social” group.
(d)
An Integer variable giving the maximum group size.
(e)
Optional: A Real variable giving the dilution factor d  to be used. If omitted, a default value of d = 0  (i.e. no dilution) is used.
(f)
Optional: If a non-zero dilution factor has been given, an additional, four-letter code is recognised which specifies the “target group size” for which competitive genetic variance components are to be estimated. Codes available are NTWO which targets a group size of 2  (by using an entry of [1∕(n− 1)]d  in the corresponding design matrix) and NBAR which targets the average group size (using coefficients [(¯n− 1)∕(n− 1)]d  in the design matrix; see ? ]). If omitted, the default used is NTWO.

For each run where a dilution factor has been specified, WOMBAT appends the value of the dilution factor used together with the corresponding maximum log likelihood to a file in the working directory named LogL4Quapprox.dat (see 7.3.8). These represent points on the profile likelihood curve for the dilution factor, and a quadratic approximation can be used to determine its maximum. The run time option --quapp is provided to assist in this task.

4.10.6.9 Fitting ‘explicit’ genetic groups

see example 18

4.10.6.10 In core storage

The default for WOMBAT is to repeatedly read data and relationship information from disk so as to minimize memory requirements. In-core storage of selected information can be specified with a line in a SPECIAL block. This should begin with the code INCORE followed (space separated) by one or more qualifier(s) choosing which information is to be stored in core. Valid qualifiers are NRM (inverse of numerator relationship matrix), GIN (*.gin matrix), DATA (recoded data), RAW (raw data) and ALL (all four).

EXAMPLE:

SPECIAL
       ...
   INCORE GIN DATA
       ...
END

4.10.6.11 Calculation of alternative outlier model statistics

Calculation of alternative outlier model (AOM) statistics for residuals or random effects fitted is invoked by specifying the option AOM-RES or AOM-RND by itself on a line in a SPECIAL block in the parameter file.

EXAMPLE:

SPECIAL
       ...
   AOM-RES
       ...
END

The AOM options are implemented for uni- and full rank standard multivariate analyses only; they are ignored for reduced rank (PC) analyses and random regression models. Calculations are carried out for one observation or random effects level at a time. For large models or cases with dense coefficient matrices C, this can take quite a long time.

AOM statistics for residual effects,      ∘ ---------------------------
R− 1ˆe∕  Dia(R −1 − R −1WC − 1W ′R −1)  , are reported together with residuals in Residuals.dat. This file has one row per record with up to seven columns. Columns 1 to 3 are written out for all analyses and contain the residual, the predicted observation and the actual observation (for reference). Specifying AOM-RES adds the diagonal element of the projection matrix (p
 ii  ; column 4), the respective element of R −1ˆe  (the nominator of the AOM statistic; column 5) and the AOM statistic (column 6). For univariate analyses, the corresponding diagonal element of the “Hat” matrix (σ2[1− σ2p ]
         ii  with σ2  the residual variance) is given in addition (column 7).

AOM statistics for random effects,       ∘ ----------------------
G− 1uˆ∕  Dia(G −1 − G −1CZZG −1)  , are reported alongside the corresponding solutions in the RnSoln_*.dat files. This file has up to 10 columns. If AOM-RND has been specified, the last three columns give the diagonal element of the covariance matrix (T_ii), the element of G −1ˆu  ((G{-1}u)_i) and the AOM statistic, respectively. Where the denominator is essentially zero (e.g. for additive genetic effects for parent without records), the AOM statistics is set to zero.

4.10.6.12 Analyses calculating and using a Gaussian Kernel matrix

Kernel based prediction using genomic information is attracting increasing interest. This can be parameterised as

K(xi,xj) = exp(dij2∕θ2)
To facilitate estimation of the bandwidth parameter, WOMBAT has been adapted to calculate the inverse of the kernel matrix for a chosen value of θ  , with dij  the ’distance’ for individuals i  and j  .

To fit a Gaussian kernel matrix as ‘relationship matrix’, the random effect which this matrix pertains to needs to be specified with the covariance option GIN. The corresponding *.gin file should follow the same format as for other effects with such covariance structure, except that the elements given should be the squared distances, d2ij  and that the diagonals of 0  can be omitted. Only rudimentary checks of the input values are performed, e.g. WOMBAT stops if a zero distance between a pair of individuals is found, as this will result in a non-positive definite matrix.

The bandwidth parameter to be used needs to be specified in a SPECIAL block. This should contain a line starting with the code KERNEL followed (space-separated) by the name of the random effect as given in the MODEL block, and the value of θ  . E.g. for random effect animal and θ = 1.5

EXAMPLE:

SPECIAL
       ...
   KERNEL animal   1.5
       ...
END

WOMBAT will then construct the kernel matrix for the given value of θ  and attempt to invert it, as well as calculate its log determinant. K is assumed to be dense and is automatically stored ‘in-core’. Inversion is carried out using LAPACK [? ] routine DPFTRF [? ] which requires a positive definite matrix.

On completion of the estimation step for a given value of θ  , it is written to the file LogL4Quapprox.dat together with the corresponding maximum log likelihood. To determine the next value on the profile likelihood for θ  to be evaluated, WOMBAT can be run with the command line option --quapp (see 5.2.11) which performs a quadratic approximation of the profile log likelihood for θ  .

4.10.6.13 Writing out correlations for random regression analyses

By default, WOMBAT calculates and writes out a small number of correlations on the ’original scale’ for the estimated covariance functions. These are primarily given to assist with checking your own calculations to obtain the correlations you are interested in - covariances and correlations are easily calculated using the file of evaluated basis functions provided.

The complete correlation matrix is not provided by default as this might result in big output files and many calculations needed. However, if you really want correlations for all values of the control value, you can specify this by a single line in a SPECIAL block containing the code RRCORR-ALL. If the analysis fits a single control variable, this writes out not only RanRegCorrels.dat, but also writes an additional file RanRegCorrAll.dat (see 7.2.4) with the same information in a format more convenient for further processing, e.g. plotting.

4.10.7 Penalized estimation

WOMBAT can carry out estimation imposing a penalty on the log likelihood function to improve the sampling properties of estimates. Three types of penalties are accommodated:

1.
A penalty regressing the so-called canonical eigenvalues towards their mean, as described by Meyer and Kirkpatrick [21], also referred to as ‘bending’. This can be done for the eigenvalues directly or after transformation to logarithmic scale. In addition, there is a choice of parameterisation.

As the canonical transformation involves only two matrices, this type of penalty is restricted to multivariate analyses fitting a simple animal model or random regression analyses fitting animals’ genetic and permanent environmental effects only.

2.
A penalty proportional to the matrix divergence between a covariance matrix to be estimated and a specified target matrix. This will shrink the estimate towards the target. Options are available to shrink a covariance matrix or a corresponding correlation matrix. A particular choice of target is the phenotypic covariance or correlation matrix estimated in a standard (unpenalized) analysis [see 22] – hence WOMBAT provides the facility to write out such target matrices. While this could, in principle, be applied to multiple matrices, the current implementation is restricted to a single matrix.
3.
A penalty on a correlation matrix,  shrinking the corresponding partial auto-correlations towards zero or their phenotypic counterparts. The penalty is derived assuming a (shifted) Beta distribution and its stringency is regulated through choice of the so-called prior effective sample size.

Penalized estimation is invoked by one or more lines in a SPECIAL block. It generates several additional output files, described in 7.2.11.

4.10.7.1 Selecting a penalty

Penalized estimation is specified by a line with space separated entries:

(a)
The code PENALTY at the beginning of the line.
(b)
A code specifying what type of penalty is to be applied. Values recognised are:
CANEIG selects a penalty on the canonical eigenvalues.
COVARM specifies shrinkage of a covariance matrix.
CORREL chooses shrinkage of a correlation matrix towards a given target correlation matrix.
KANEIG selects a penalty on the canonical eigenvalues, obtained assuming a Beta distribution.
KORREL selects a penalty on individual correlations, assumed to have a Beta distribution.
PACORR chooses shrinkage of a the matrix of partial auto-correlations for a random effect towards the corresponding phenotypic or an identity matrix.

The following variables depend on the form of penalty chosen.

For CANEIG:

(c)
A character variable to specify the ‘scale’ to be used. Options available are:
ORG selects a penalty on the canonical eigenvalues on the original scale.
LOG specifies a penalty on the canonical eigenvalues after transformation to logarithmic scale.
LOGB is similar to LOG but a penalty is applied to both log(λi)  and log(1 − λi)  , where λi  denotes the i− th canonical eigenvalue.
(d)
A character variable to specify the parametrisation of λi  to be used. Options available are:
ORG estimate λi  directly.
LOG estimate log(λi)  , i.e. λi  transformed to logarithmic scale.
TIC estimate log(λi∕(1 − λi))  , i.e. λi  transformed to logistic scale.

This variable is optional; if omitted, it is set to the same value as that given for the scale of the penalty (LOG for LOGB).

(e)
A real or integer variable specifying either the tuning factor ψ  to be used, or, if estimates for multiple tuning factors are to be obtained, the number of ψ  ’s to be read subsequently or the number of different ranges of ψ  ’s to be stepped through:
  • If a real value is given, WOMBAT interprets this as the (single) tuning factor to be used, and expects no further input.
    N.B. This value must contain a decimal point to be recognised as such!

    EXAMPLE:

    SPECIAL
       PENALTY  CANEIG  LOG 5.0
    END

    specifies a penalty on the canonical eigenvalues transformed to logarithmic scale for a tuning factor of ψ = 5.0  .

  • If a positive integer is found, this is interpreted as the number of different values for ψ  to be used. These are to be read in a space-separated list, beginning on the next line.

    EXAMPLE:

    SPECIAL
       PENALTY  CANEIG  ORG  6
       0.5  1.0  2.5  5.2  8.7  11.0
    END
  • If a negative integer, − n  is specified, this is interpreted as the number of ranges of ψ  to be considered. These are expected to be found on the following n  lines, with one triplet of space-separated real values per line, consisting of starting value, end value and step size.

    EXAMPLE:

    SPECIAL
       PENALTY  CANEIG  LOG TIC -2
       0.0  3.0  0.5
       5.0  8.0  1.0
    END
  • Finally, a value of 0  (integer) specifies that the values of ψ  are to be read form a separate file. In this case, the name of this file needs to be given (space-separated from 0  ) on the same line. This file should give the tuning factors to be used, with each value given on a separate line.

All values of ψ  given should be in ascending order.

For COVARM and CORREL:

(c)
An optional character variable:
PHENV This option specifies that the estimate of the phenotypic covariance or correlation matrix is to be written out to the file PenTargetMatrix.
It is best used in a preliminary step with a single ψ = 0  to generate the matrix to be used in subsequent penalized runs where we wish to shrink towards this matrix.
The default is for this file to exist and to be read.

EXAMPLE:

SPECIAL
   PENALTY  COVARM PHENV  animal  0.0
END

KLDIV The default penalty on a matrix of size q× q  is
P∝ C log |Σ| + tr(Σ1T)
with T the shrinkage target and C = (ψ + q+ 1)∕ψ  . The latter value originates from the assumption of an Inverse Wishart distribution for Σ  . This option sets C = 1  , i.e. applies a penalty proportional to the Kullback-Leibler divergence between Σ   and T.

EXAMPLE:

SPECIAL
   PENALTY  COVARM  KLDIV  animal 7.50
END

(d)
The name of the random effect covariance matrix to be penalized. This must match the name given in a VARIANCE statement (see 4.9) earlier in the parameter file.

EXAMPLE:

SPECIAL
   PENALTY  CORREL  animal -1
   0.0 5.0 1.0
END
(e)
Information on tuning parameter(s) to be used, as for CANEIG (see above).

For KANEIG: see example 19 For KORREL: see example 19 For PACORR:

(c)
An optional character variable:
PHENV This specifies that the shrinkage target is the phenotypic matrix of partial auto-correlations
IDENT This specifies that the shrinkage target is the identity matrix

If omitted, the default shrinkage target is the identity matrix.

(d)
The name of the random effect covariance matrix to be penalized. This must match the name given in a VARIANCE statement (see 4.9) earlier in the parameter file.
(e)
A real value specifying the ‘effective sample size’ of the Beta prior (no multiple values allowed for this form of penalty).

EXAMPLE:

SPECIAL
   PENALTY PACORR  animal  4.0
END

4.10.7.2 Stopping after a given change in likelihood

If multiple tuning factors are given, it is sometimes desirable to stop once the deviation in the unpenalized log likelihood from the maximum (at ψ = 0  ) has exceeded a certain limit. This can be specified by an additional line with three space-separated entries:

(a)
The code PENALTY at the beginning of the line.
(b)
The code LIMIT.
(c)
A real variable with the value for the limit.

EXAMPLE:

SPECIAL
   PENALTY  CORREL  KLDIV  animal -2
   0.0  5.0   0.5
   6.0  15.0  1.0
   PENALTY  LIMIT  3.315
END

For this option to work, the first tuning factor given has to be ψ = 0  !

HINT: Use run time option -–-valid (5.3.2) for validation steps when using cross-validation to estimate the value for ψ  to be used.