4.16 Special information

A place to collect miscellaneous options which do not fit into the categories above is provided through a block entry beginning with a line SPECIAL, and ending with a line END.

Options are available to address the following problems:

4.16.1 Weighted analysis

A weighted analysis is specified by a line with the following space separated entries:

(a)
The code WEIGHT at the beginning of the line.
(b)
The name of the column in the data containing the weight.
(c)
Optionally, for standard uni- and multivariate analyses, this can be followed by a code to select the type of weighting to apply:
none is the default: each record is multiplied by the weight given prior to the analysis.
DESIGN causes entries of “1” in the design matrices to be replaced with the weight specified (observations are not scaled).
RESID multiplies the diagonal elements of the residual covariance matrix pertaining to the observation with the weight given; in addition, any non-zero elements for a pair of records (on the same individual) are scaled with the geometric mean of the corresponding pair of weights (i.e. √-----
 wiwj  ).

EXAMPLE: Consider an animal with two records and corresponding weights w1 = 0.8  and w2 = 2  . Let the unweighted residual covariance matrix be

(        )                  (              )
   10  − 5      with inverse      0.1143  0.0286
  −5   20                     0.0286  0.0571
The weighted matrix is then
(     8  − 6.325 )               ( 0.14286  0.02259 )
  −6.325      40      with inverse    0.02259  0.02857

4.16.2 Covariables that are zero

WOMBAT now checks for covariables which have a value of zero. If you fit a covariable which has such values, it expects you to confirm that this is intentional. This is done through a single line with space separated entries:

(a)
The code COVZER at the beginning of the line.
(b)
The full name of the covariable as given in the model statement (i.e. including brackets, number of coefficients and, if applicable, type).
(c)
A three letter option: FIT to confirm that a value of zero for this covariable is a valid code, or IGN to skip such records (any other option will lead to a programmed error stop).

4.16.3 Animals that are clones

WOMBAT will stop if it encounters non-consecutive records for the same subject in different parts of the data file, assuming that the data has not been sorted as required. However, there are situations where this is a ‘planned’ scenario, for example to fit data on clones, where we want the same genetic effect but different residual effects for members of a clone. Hence, this check can be switched off by including a line with the code CLONES in the SPECIAL block.

4.16.4 Standard errors for fixed and random effect solutions

WOMBAT reports solutions for fixed and random effects fitted – these are calculated as a by-product by all algorithms to obtain REML estimates of variance components. However, standard deviations/errors are only reported for those algorithms which invert the coefficient matrix in the mixed model equations. An option is provided to enforce this – this is invoked by a line (in a SPECIAL block) with the code FORCE-SE. If standard errors have been obtained automatically, this has no effect. Otherwise, it forces calculation of the inverse of the coefficient matrix at convergence.

N.B.: For large analyses, this can take quite some extra time.

4.16.5 Repeated records

WOMBAT attempts to check the data for records representing repeated measurements for a trait and individual, and stops when this is fairly low. This can be overridden to some extent: Analyses with repeated records for a trait for less than 20% and more than 10% of individuals can be enabled by a single line with the code REPEAT at the beginning of the line.

Standard multivariate analyses (MUV) with repeated records can represent a somewhat difficult scenario, as proper modelling of the residual covariance structure relies on differentiating between records for different traits are taken simultaneously and thus should be assumed to be subject to a common,temporary environmental (residual) covariance, and which should not. There are four strategies which can be chosen by specifying a single line with the following space separated entries:

(a)
The code RPTCOV at the beginning of the line.
(b)
A seven-letter qualifier to select he strategy to be used. Valid terms are:
INDESCR to fit a residual covariance σEij  between all observations for traits i  and j  for an individual (this was the previous default).
ALIGNED to fit a residual covariance σEij  only between obervations with the same running running number, e.g. the first record for trait i  and the first record for trait j  (this is the current default).
ZEROALL for the case where all temporary environmental covariances σEij  are assumed to be zero. This should be accompanied by corresponding starting values for the residual covariance components.
TSELECT when none of the other options are appropriate and where the data file has a column with a ‘time of recording’ indicator (integer). The name of this column must be given on the same line, space-separated after TSELECT. This allows for proper identification of records for different traits taken at the same time – and thus assumed to have a non-zero, temporary environmental covariance – and those which are not when there is an arbitrary pattern of missing records.

EXAMPLE:

SPECIAL
   RPTCOV TSELECT  mtime
END

Specification of this option is required for multivariate analyses combining traits with single and repeated records.

4.16.6 Fitting SNP effects

When using the run option --snap, the covariable in the model representing SNP effects needs to be identified by a line in a SPECIAL block containing

(a)
The code QTLEFF at the beginning of the line.
(b)
Space separated, the full name of the covariable, as specified in the MODEL block (i.e. including brackets and the order of it).
(c)
Optional, for analyses estimating more than one regression coefficient: Space separated, the code COVOUT to specify that the upper triangle of the covariance matrix of regression coefficients is written out.

In addition, a COVZER statement may be required to ensure WOMBAT treats count of zero as valid covariable values.

For ‘house-keeping’ reasons, WOMBAT requires that these covariables are specified (in the MODEL block) before any other, ‘regular’ covariables to be fitted.

EXAMPLE:

SPECIAL
   QTLEFF  snp(1)
   COVZER  snp(1)  FIT
END

4.16.7 Sampling to approximate standard errors

The default in drawing samples from distribution of estimates of covariance matrices is to sample the parameters estimated, i.e. the elements of the Cholesky factor of the matrix of interest, and to construct the matrix samples from these. This yields samples within the parameter space. However, it can also yield substantial differences in the mean of samples and their variances from the ‘input’ values. Hence an option is provided to select sampling of the covariance matrices instead. This will yield means close to the REML estimates and estimates of sampling variances closer to those derived from the inverse of the average information matrix, but is likely to yield samples out of the parameter space.

To select a single covariance matrix to be sampled when approximating sampling variances (see 5.2.9) requires a single line in a SPECIAL block with at least two entries:

(a)
The code SAMPLEAI at the beginning of the line.
(b)
The name of the random effect as specified in the MODEL block (space separated).
(c)
Optionally, this can be followed (space separated) by the code COVAR to select sampling of covariance matrices.
(d)
Optionally again, COVAR can be followed (space separated) by the code TRUNC which invokes modification of samples with eigenvalues close to zero (truncating at   −4
10  ) to ensure the sample has the rank specified.

EXAMPLE:

SPECIAL
   SAMPLEAI  animal
END

4.16.8 Fitting “social” genetic effects

To fit a model with an additional random effect representing individuals’ competitive genetic effects, a random regression model analysis has to be selected – fitting random effects at an order of 1 with a basis function of “1” (option ONE) yields an equivalent model to a standard, univariate analysis whilst providing the flexibility to model residual variances as heterogeneous. WOMBAT then requires a “social” group code to be supplied and collects the identities of the individuals in each group as part of its set-up steps.

Additional information required is to be supplied by a single line in the SPECIAL block. This should contain the following, space separated entries:

(a)
The code SOCIAL at the start of the line.
(b)
The name of the random effect representing the “social” genetic effect. This can be abbreviated by omitting the part in brackets specifying the order of fit and basis function (e.g. (1,ONE)).
(c)
The name of the fixed or extra effect representing the “social” group.
(d)
An Integer variable giving the maximum group size.
(e)
Optional: A Real variable giving the dilution factor d  to be used. If omitted, a default value of d = 0  (i.e. no dilution) is used.
(f)
Optional: If a non-zero dilution factor has been given, an additional, four-letter code is recognised which specifies the “target group size” for which competitive genetic variance components are to be estimated. Codes available are NTWO which targets a group size of 2  (by using an entry of [1∕(n− 1)]d  in the corresponding design matrix) and NBAR which targets the average group size (using coefficients [(¯n− 1)∕(n− 1)]d  in the design matrix; see Bijma [3]). If omitted, the default used is NTWO.

For each run where a dilution factor has been specified, WOMBAT appends the value of the dilution factor used together with the corresponding maximum log likelihood to a file in the working directory named LogL4Quapprox.dat (see 7.3.8). These represent points on the profile likelihood curve for the dilution factor, and a quadratic approximation can be used to determine its maximum. The run time option --quapp is provided to assist in this task.

4.16.9 Fitting ‘explicit’ genetic groups

see example 18

4.16.10 In core storage

The default for WOMBAT is to repeatedly read data and relationship information from disk so as to minimize memory requirements. In-core storage of selected information can be specified with a line in a SPECIAL block. This should begin with the code INCORE followed (space separated) by one or more qualifier(s) choosing which information is to be stored in core. Valid qualifiers are NRM (inverse of numerator relationship matrix), GIN (*.gin matrix), DATA (recoded data), RAW (raw data) and ALL (all four).

EXAMPLE:

SPECIAL
       ...
   INCORE GIN DATA
       ...
END

4.16.11 Calculation of alternative outlier model statistics

Calculation of alternative outlier model (AOM) statistics for residuals or random effects fitted is invoked by specifying the option AOM-RES or AOM-RND by itself on a line in a SPECIAL block in the parameter file.

EXAMPLE:

SPECIAL
       ...
   AOM-RES
       ...
END

The AOM options are implemented for uni- and full rank standard multivariate analyses only; they are ignored for reduced rank (PC) analyses and random regression models. Calculations are carried out for one observation or random effects level at a time. For large models or cases with dense coefficient matrices C, this can take quite a long time.

AOM statistics for residual effects,  − 1 ∘ ------−1----−1---−-1--′-−1-
R  ˆe∕  Dia(R   − R   WC    W R   )  , are reported together with residuals in Residuals.dat. This file has one row per record with up to seven columns. Columns 1 to 3 are written out for all analyses and contain the residual, the predicted observation and the actual observation (for reference). Specifying AOM-RES adds the diagonal element of the projection matrix (pii  ; column 4), the respective element of   −1
R   ˆe  (the nominator of the AOM statistic; column 5) and the AOM statistic (column 6). For univariate analyses, the corresponding diagonal element of the “Hat” matrix (σ2[1− σ2pii]  with σ2  the residual variance) is given in addition (column 7).

AOM statistics for random effects,  − 1  ∘ -----−1----−1--ZZ--−1-
G  uˆ∕  Dia(G   − G   C   G   )  , are reported alongside the corresponding solutions in the RnSoln_*.dat files. This file has up to 10 columns. If AOM-RND has been specified, the last three columns give the diagonal element of the covariance matrix (T_ii), the element of   −1
G   ˆu  ((G{-1}u)_i) and the AOM statistic, respectively. Where the denominator is essentially zero (e.g. for additive genetic effects for parent without records), the AOM statistics is set to zero.

4.16.12 Analyses using a Gaussian Kernel matrix

Kernel based prediction using genomic information is attracting increasing interest. This can be parameterised as

K(xi,xj) = exp(dij2∕θ2)

To facilitate estimation of the bandwidth parameter, WOMBAT has been adapted to calculate the inverse of the kernel matrix for a chosen value of θ  , with d
 ij  the ’distance’ for individuals i  and j  .

To fit a Gaussian kernel matrix as ‘relationship matrix’, the random effect which this matrix pertains to needs to be specified with the covariance option GIN. The corresponding *.gin file should follow the same format as for other effects with such covariance structure, except that the elements given should be the squared distances,  2
dij  and that the diagonals of 0  can be omitted. Only rudimentary checks of the input values are performed, e.g. WOMBAT stops if a zero distance between a pair of individuals is found, as this will result in a non-positive definite matrix.

The bandwidth parameter to be used needs to be specified in a SPECIAL block. This should contain a line starting with the code KERNEL followed (space-separated) by the name of the random effect as given in the MODEL block, and the value of θ  . E.g. for random effect animal and θ = 1.5

EXAMPLE:

SPECIAL
       ...
   KERNEL animal   1.5
       ...
END

WOMBAT will then construct the kernel matrix for the given value of θ  and attempt to invert it, as well as calculate its log determinant. K is assumed to be dense and is automatically stored ‘in-core’. Inversion is carried out using LAPACK [2] routine DPFTRF [16] which requires a positive definite matrix.

On completion of the estimation step for a given value of θ  , it is written to the file LogL4Quapprox.dat together with the corresponding maximum log likelihood. To determine the next value on the profile likelihood for θ  to be evaluated, WOMBAT can be run with the command line option --quapp (see 5.2.12) which performs a quadratic approximation of the profile log likelihood for θ  .

4.16.13 Correlations for random regression analyses

By default, WOMBAT calculates and writes out a small number of correlations on the ’original scale’ for the estimated covariance functions. These are primarily given to assist with checking your own calculations to obtain the correlations you are interested in - covariances and correlations are easily calculated using the file of evaluated basis functions provided.

The complete correlation matrix is not provided by default as this might result in big output files and many calculations needed. However, if you really want correlations for all values of the control value, you can specify this by a single line in a SPECIAL block containing the code RRCORR-ALL. If the analysis fits a single control variable, this writes out not only RanRegCorrels.dat, but also writes an additional file RanRegCorrAll.dat (see 7.2.4) with the same information in a format more convenient for further processing, e.g. plotting.

4.16.14 Genomic relationship matrices

See documentation in Example 20.

4.16.15 Penalized estimation

WOMBAT can carry out estimation imposing a penalty on the log likelihood function to improve the sampling properties of estimates. Three types of penalties are accommodated:

1.
A penalty regressing the so-called canonical eigenvalues towards their mean, as described by Meyer and Kirkpatrick [32], also referred to as ‘bending’. This can be done for the eigenvalues directly or after transformation to logarithmic scale. In addition, there is a choice of parameterisation.

As the canonical transformation involves only two matrices, this type of penalty is restricted to multivariate analyses fitting a simple animal model or random regression analyses fitting animals’ genetic and permanent environmental effects only.

2.
A penalty proportional to the matrix divergence between a covariance matrix to be estimated and a specified target matrix. This will shrink the estimate towards the target. Options are available to shrink a covariance matrix or a corresponding correlation matrix. A particular choice of target is the phenotypic covariance or correlation matrix estimated in a standard (unpenalized) analysis [see 33] – hence WOMBAT provides the facility to write out such target matrices. While this could, in principle, be applied to multiple matrices, the current implementation is restricted to a single matrix.
3.
A penalty on a correlation matrix,  shrinking the corresponding partial auto-correlations towards zero or their phenotypic counterparts. The penalty is derived assuming a (shifted) Beta distribution and its stringency is regulated through choice of the so-called prior effective sample size.

Penalized estimation is invoked by one or more lines in a SPECIAL block. It generates several additional output files, described in 7.2.11.

4.16.15.1 Selecting a penalty

Penalized estimation is specified by a line with space separated entries:

(a)
The code PENALTY at the beginning of the line.
(b)
A code specifying what type of penalty is to be applied. Values recognised are:
CANEIG selects a penalty on the canonical eigenvalues.
COVARM specifies shrinkage of a covariance matrix.
CORREL chooses shrinkage of a correlation matrix towards a given target correlation matrix.
KANEIG selects a penalty on the canonical eigenvalues, obtained assuming a Beta distribution.
KORREL selects a penalty on individual correlations, assumed to have a Beta distribution.
PACORR chooses shrinkage of a the matrix of partial auto-correlations for a random effect towards the corresponding phenotypic or an identity matrix.

The following variables depend on the form of penalty chosen.

For CANEIG:

(c)
A character variable to specify the ‘scale’ to be used. Options available are:
ORG selects a penalty on the canonical eigenvalues on the original scale.
LOG specifies a penalty on the canonical eigenvalues after transformation to logarithmic scale.
LOGB is similar to LOG but a penalty is applied to both log(λi)  and log(1 − λi)  , where λi  denotes the i− th canonical eigenvalue.
(d)
A character variable to specify the parametrisation of λi  to be used. Options available are:
ORG estimate λi  directly.
LOG estimate log(λi)  , i.e. λi  transformed to logarithmic scale.
TIC estimate log(λi∕(1 − λi))  , i.e. λi  transformed to logistic scale.

This variable is optional; if omitted, it is set to the same value as that given for the scale of the penalty (LOG for LOGB).

(e)
A real or integer variable specifying either the tuning factor ψ  to be used, or, if estimates for multiple tuning factors are to be obtained, the number of ψ  ’s to be read subsequently or the number of different ranges of ψ  ’s to be stepped through:
  • If a real value is given, WOMBAT interprets this as the (single) tuning factor to be used, and expects no further input.
    N.B. This value must contain a decimal point to be recognised as such!

    EXAMPLE:

    SPECIAL
       PENALTY  CANEIG  LOG 5.0
    END

    specifies a penalty on the canonical eigenvalues transformed to logarithmic scale for a tuning factor of ψ = 5.0  .

  • If a positive integer is found, this is interpreted as the number of different values for ψ  to be used. These are to be read in a space-separated list, beginning on the next line.

    EXAMPLE:

    SPECIAL
       PENALTY  CANEIG  ORG  6
       0.5  1.0  2.5  5.2  8.7  11.0
    END
  • If a negative integer, − n  is specified, this is interpreted as the number of ranges of ψ  to be considered. These are expected to be found on the following n  lines, with one triplet of space-separated real values per line, consisting of starting value, end value and step size.

    EXAMPLE:

    SPECIAL
       PENALTY  CANEIG  LOG TIC -2
       0.0  3.0  0.5
       5.0  8.0  1.0
    END
  • Finally, a value of 0  (integer) specifies that the values of ψ  are to be read form a separate file. In this case, the name of this file needs to be given (space-separated from 0  ) on the same line. This file should give the tuning factors to be used, with each value given on a separate line.

All values of ψ  given should be in ascending order.

For COVARM and CORREL:

(c)
An optional character variable:
PHENV This option specifies that the estimate of the phenotypic covariance or correlation matrix is to be written out to the file PenTargetMatrix.
It is best used in a preliminary step with a single ψ = 0  to generate the matrix to be used in subsequent penalized runs where we wish to shrink towards this matrix.
The default is for this file to exist and to be read.

EXAMPLE:

SPECIAL
   PENALTY  COVARM PHENV  animal  0.0
END

KLDIV The default penalty on a matrix of size q× q  is
P∝ C log |Σ| + tr(Σ1T)

with T the shrinkage target and C = (ψ + q+ 1)∕ψ  . The latter value originates from the assumption of an Inverse Wishart distribution for Σ  . This option sets C = 1  , i.e. applies a penalty proportional to the Kullback-Leibler divergence between Σ   and T.

EXAMPLE:

SPECIAL
   PENALTY  COVARM  KLDIV  animal 7.50
END

(d)
The name of the random effect covariance matrix to be penalized. This must match the name given in a VARIANCE statement (see 4.10) earlier in the parameter file.

EXAMPLE:

SPECIAL
   PENALTY  CORREL  animal -1
   0.0 5.0 1.0
END
(e)
Information on tuning parameter(s) to be used, as for CANEIG (see above).

For KANEIG: see example 19 For KORREL: see example 19 For PACORR:

(c)
An optional character variable:
PHENV This specifies that the shrinkage target is the phenotypic matrix of partial auto-correlations
IDENT This specifies that the shrinkage target is the identity matrix

If omitted, the default shrinkage target is the identity matrix.

(d)
The name of the random effect covariance matrix to be penalized. This must match the name given in a VARIANCE statement (see 4.10) earlier in the parameter file.
(e)
A real value specifying the ‘effective sample size’ of the Beta prior (no multiple values allowed for this form of penalty).

EXAMPLE:

SPECIAL
   PENALTY PACORR  animal  4.0
END

4.16.15.2 Stopping after a given change in likelihood

If multiple tuning factors are given, it is sometimes desirable to stop once the deviation in the unpenalized log likelihood from the maximum (at ψ = 0  ) has exceeded a certain limit. This can be specified by an additional line with three space-separated entries:

(a)
The code PENALTY at the beginning of the line.
(b)
The code LIMIT.
(c)
A real variable with the value for the limit.

EXAMPLE:

SPECIAL
   PENALTY  CORREL  KLDIV  animal -2
   0.0  5.0   0.5
   6.0  15.0  1.0
   PENALTY  LIMIT  3.315
END

For this option to work, the first tuning factor given has to be ψ = 0  !

HINT: Use run time option -–-valid (5.3.2) for validation steps when using cross-validation to estimate the value for ψ  to be used.