Chapter 9
Worked examples

A number of worked examples are provided to illustrate the use of WOMBAT and, in particular, show how to set up the parameter files.

Installation for the suite of examples is described in section 3.1.4 . This generates the directory WOMBAT/Examples with subdirectories Examplen (n = 1,…,9).

Each subdirectory contains the data and pedigree files for a particular example, a file WhatIsIt with a brief description of the example, and one or subdirectories for individual runs, A, B, C, .

Each ‘run’ directory (A, B, ) contains :

(a)
A parameter file (.par)
(b)
The file typescript (generated using the script command) which contains the screen output for the run.
Run time options used can be found at the top of this file.
(c)
The numerous output files generated by WOMBAT.

N.B.: The example data sets have been selected for ease of demonstration, and to allow fairly rapid replication of the example runs. Clearly, most of the data sets used are too small to support estimation of all the parameters fitted, in particular for the higher dimensional analyses shown !

N.B.: Examples runs have been carried out on a 64-bit machine (Intel I7 processor, rated at 3.20Ghz) ; numbers obtained on 32-bit machine may vary slightly.

Note further that all the example files are Linux files - you may need to ’translate’ them to DOS format if you plan to run the examples under Windows.

Example 1

This shows a univariate analysis for a simple animal model, fitting a single fixed effect only.

Source: Simulated data; Example 1 from DfReml

Example 2

This shows a bivariate analysis for the case where the same model is fitted for both traits and all traits are recorded for all animals. The model of analysis includes 3 cross-classified fixed effects and an additional random effect

A
Analysis fitting an animal model
B
Analysis fitting a sire model

Source: Data from Edinburgh mouse lines; Example 2 from DfReml

Example 3

This example involves up to six repeated records for a single trait, recorded at different ages. The model of analysis is an animal model with a single fixed effect. Data are analysed :

A
Fitting a univariate ‘repeatability’ model, with age as a covariable
B
Fitting a multivariate analysis with 6 traits
C
Fitting a univariate random regression model

Source: Wokalup selection experiment; Example 3 from DfReml

Example 4

This example shows a four-variate analysis for a simple animal model. Runs show:

A
A ‘standard’ full rank analysis
B
A reduced rank analysis, fitting the first two principal components only
C
A reduced rank analysis using the EM algorithm

Source: Australian beef cattle field data

Example 5

Similar to example 4, but involving 6 traits. Runs show:

A
A ‘standard’ full rank analysis
B
A reduced rank analysis, fitting the first four principal components
C
An analysis fitting a factor-analytic structure for the genetic covariance matrix
D
A full rank analysis, illustrating use of penalized REML (’bending’) for a chosen tuning parameter

Source: Australian beef cattle data

Example 6

This example involves 4 measurements, with records on different sexes treated as different traits. This gives an eight-variate analysis, with 16 residual covariances equal to zero. The model of analysis is a simple animal model.

A
A full rank analysis with ‘good’ starting values
B
A full rank analysis with ‘bad’ starting values

Source: Australian beef cattle field data

Example 7

This example illustrates the analysis of 4 traits, subject to genetic and permanent environmental effects. The model of analysis involves several crossclassified fixed effects, nested covariables and different effects for different traits.

A
Univariate analysis for trait 1
B
Univariate analysis for trait 2
B1
As B but reworked by setting up NRM inverse externally to illustrate use of GIN option
B2
As B, but fitting fixed effects only and a user-defined basis function for the covariable dam age
C
Univariate analysis for trait 2, allowing for a non-zero direct-maternal genetic covariance
C1
As C but using GIN instead of NRM option
D
Bivariate analysis for traits 1 and 2
E
Bivariate analysis for traits 1 and 2, allowing for a non-zero direct-maternal genetic covariance
F
Trivariate analysis for traits 1, 2 and 3
G
Fourvariate analysis of all traits
H
Fourvariate analysis of all traits, not fitting maternal effects for trait 4
I
Reduced rank, fourvariate analysis of all traits, not fitting maternal effects for trait 4

Source: Wokalup selection experiment

Example 8

This is an example of a model where different random effects are fitted for different traits. It is a bivariate analysis of mature cow weights together with gestation length. Mature cow weight involves repeated records per animal, and a permanent environmental effect of the animal is thus fitted for this trait. Gestation length is treated as trait of the calf and assumed to be affected by both genetic and permanent environmental effects.

A
Standard model
B
Equivalent model, using the PEQ option for permanent environmental effects of the animal.

Source: Australian beef cattle field data

Example 9

This example illustrates random regression analyses fitting an additional random effect, using B-splines as basis functions and imposing rank restrictions on estimated covariance functions. Data are monthly records for weights of calves from birth to weaning.

A
Full rank analysis
A1
As A, but evaluating the basis functions externally to illustrate the use of user-defined basis functions.
B
Reduced rank analysis
C
Reduced rank analysis with different ranks

Source: Wokalup selection experiment

Example 10

In a recent paper, Wilson et al. [49] presented a tutorial on ‘animal model’ analyses of data from a wild population. This example replicates the analyses shown (and expands it by demonstrating how to evaluate likelihood fixing the genetic covariance at zero, in order to carry out a likelihood ratio test for this component).

The tutorial and all data (& pedigree) files used in this example are not included – you have to download these from their web site:

http://wildanimalmodels.org

A
Simple univariate analysis
B
Bivariate analysis
B1
Bivariate analysis, fixing the genetic covariance at zero
C
Repeated records model

Source: The Wild Animal Models Wiki

Example 11

This example demonstrates simple bi-variate random regression analyses.

A
Using a RR model to carry out a multivariate analysis with repeated records where the pattern of temporary environmental covariances is determined through the control variable.
A1
Multivariate analysis corresponding to A
B
Bi-variate RR analysis fitting a quadratic regression on Legendre polynomials and homogeneous measurement error variances.

Source: Simulated records

Example 13

This example illustrates multivariate analyses with repeated records per trait, especially some new features:

The data were obtained by simulating records for 4 traits recorded on 800 animals at 5 different times. A missing value indicator (999) is used to create different pattern of missing records - note that analyses in the different sub-directories analyze different columns in the data file.

A
Demonstrates an analysis without missing records, i.e. where all traits are recorded at the same time. This implies that there are non-zero error covariances between all traits and that the ALIGNED option is appropriate.
B
Shows the analysis when some records are missing, but in a systematic fashion: Traits 1 and 2 have records at all 5 times, but traits 3 and 4 are only recorded for times 1 and 2. As the ‘missing’ observations only occur for the later times, the option ALIGNED is still appropriate.
C
Similar to B, but measurements for traits 3 and 4 are taken at times 2 and 4. This means that a time of recording indicator needs to be used to model the residual covariance structure correctly. This is done specifying TSELECT together with the name of the column in the data file which contains the time variable.
C1
As C, but using a multivariate random regression analysis.
D
Illustrates the scenario where we have a trait with repeated records analysed together with traits with single records and where traits with single and repeated records are measured at different times so that
i)
there are no error covariances between these groups of traits and
ii)
that we can ‘use’ the error covariance to model covariances between traits due to permanent environmental effects of the animal.

For this example, we use records taken at times 1 to 4 for trait 1, and records taken at time 5 for traits 2 to 4. For this case a model fitting a permanent environmental effects due to the animal for trait 1 only together with the INDESCR option is appropriate. Estimates of the error covariances betwen trait 1 and traits 2, 3 and 4 then reflect the permanent environmental covariance, while estimates of the (co)variances among the latter represent the sum of temporary and permanent environmental covariances.

E
Shows the case where we have a trait with repeated records analysed together with traits with single records, but where the single records are taken at the same time as one of the repeated records, so that we need to model non-zero error covariances. Here we consider records for trait 1 at all 5 times, and records for traits 2 to 3 taken at time 5. Again we need the TSELECT option to model this properly. In addition, we need to use the equivalent model invoked via the PEQ option in order to separate temporary and permanent envirionmental covariances between trait 1 and the other traits. Note that permanent environmental effects are fitted for all 4 traits, but that only the corresponding covariance components which can be disentangled from the environmental covariances are reported.

Source: Simulated records

Example 14

This gives a toy example illustrating the use of the run option --snap for a simple GWAS type analysis.

Brief notes are supplied as a .pdf file with the example or can be downloaded as RNote_WOMBATSnappy.pdf

A
Demonstrates run option --snap. This invokes estimation of SNP effects and their standard errors for (co)variances assumed to be known. In addition to the data, pedigree and parameter file required for ’standard’ analyses, it requires a file supplying counts for the number of copies of the reference allele; this has the default name "SNPCounts.dat" and must be (or a link to it) in the working directory. The output file "SNPSolutions.dat" contains estimates of SNP effects, their standard errors and resulting t-values, followed by their p-value derived from the Normal density and log 10 of the latter (one per line).
B
As A, but for a bivariate analysis (2 traits, fit a single SNP). The output file SNPSolutions.dat contains 6 values per line: estimates of the SNP effects for traits 1 and 2, their standard errors and resulting t-values.
C
As A, but for two SNPs (no p-values).
D
Ditto, but two SNPs & two traits
E
As A with the addition of ‘explicit’ genetic groups

Source: Simulated records

Example 15

This example illustrates how to pool estimates of covariance components by (penalized) maximum likelihood.

Some notes are supplied as a .pdf file with the example or can be downloaded as RNote_WOMBATPool.pdf

The example is comprised of 14 traits, with 6 traits measured on few animals and the remaining records representing 4 traits with measures on males and females treated as different traits [from 33]. There are results from 76 bivariate and one six-variate analysis to be combined. Due to traits measured on different subsets of animals (sexes) there are a number of residual covariances which are to be fixed at zero.

A
All part analyses have been carried out using WOMBAT and a ‘full’ parameter file is available.
B
Results from part analyses are summarized in a single file and a ‘minimum’ parameter file is used.

Example 16

This example shows what options are available in WOMBAT to fit ‘social interaction’ type models.

A
This directory shows an example run for simulated data for a dilution factor of 0, treating direct and social genetic effects as uncorrelated.
B
As A, but allowing for a non-zero genetic covariance
C
As B, but treating social group and residual variances as heterogeneous.
D
Fitting the same model as in B to data simulated with a non-zero dilution factor, this directory shows the multiple runs required to estimated this factor using a quadratic approximation to the profile likelihood.
Z
Larry Schaeffer has some notes which contain a very simple example. This subdirectory shows how to obtain the BLUP solutions given using WOMBAT. www.aps.uoguelph.ca/ lrs/ABModels/NOTES/SSocial.pdf

Example 17

This example shows how WOMBAT can be used to approximate the sampling distribution of estimates by sampling from their asymptotic, multivariate normal distribution, as described by Meyer and Houle [35].

A
Sample estimates for the genetic covariance matrix in a 5-trait analysis (Note that 100 samples is used for illustration only - more should be used in practice!)

Example 18

This example illustrates the use of the single-step option, --s1step

A
Univariate analysis
B
As A, using –blup
C
Bivariate analysis
D
Univariate analysis, fitting "explicit" genetic groups

Example 19

This example illustrates the use of simple penalties to reduce sampling variances in REML estimation.

Brief notes are supplied as a .pdf file with the example or can be downloaded as RNote_WOMBATSimplePenalty.pdf

0
Parameter file to simulate data
A
Standard unpenalized analysis
B
Unpenalized analysis, parameterised to elements of the canonical decomposition
C
Penalty on canonical eigenvalues with ESS=8
D
Penalty on genetic partial correlations with ESS=8, shrinking towards phenotypic values
Z
Script to simulate replicates & Fortran code to summarize results

Example 20

This example shows some of the options available for a pre-analysis step with run option --hinv to calculate relationship matrices and accompanying statistics required for genomic evaluation. Specifically:

a)
the genomic relationship matrix (GRM) and its inverse,
b)
the submatrix of the numerator relationship matrix (NRM) for genotyped animals and its inverse,
c)
the inverse of the NRM including metafounders, and
d)
the inverse of the combined relationship for single step evaluation (H1).

It has grown to provide a substantial number of options. The relevant documentation is supplied as a .pdf file with the example or can be downloaded as RNote_WOMBATHinverse.pdf

A
Calculation of the complete H-inverse for simulated data with 370 genotypes and 750 animals in total.
A1
As A, but with scaling of the GRM as described by Christensen [7]
A2
As A, but estimating base population frequencies to center allele counts.
A3
Shows calculation of the diagonal elements of H for A, using special run option --hdiag. This inverts H1 using sparse matrix inversion and writes out the diagonal elements for all individuals.
A4
Shows calculation of the diagonal elements of H for A as an optional extra using Method 3 of Legarra et al. [24].
B
Data as A. Illustrates calculation of the GRM and GRM-Inverse only.
B1
As B but with eigen-decomposition of the GRM.
C
Data as A. Illustrates calculation of A22 only.
D
As A, but augmenting H1 by equations for "implicit" genetic groups (Group proportions to be read from pedigree file).
D1
As D, but carrying through genetic group proportions needed to fit genetic groups explicitly to *.codes file.
D2
As D, but reading unknown parent group (UPG) from colums 5 and 6 in pedigree file (must be coded 1 to number of groups) and writing them to colums 4 and 5 in corresponding *.codes file.
E
Data as A. Illustrates calculation of the GRM-Inverse only using the Woodbury formula.
F
Data as A. Illustrates calculation of H1 using the Woodbury formula to obtain the inverse of the GRM.
G
As B, but approximating the GRM inverse using the APY algorithm
H
As A, but approximating H1 using the APY algorithm
I
Shows computation of the inverse numerator relationship matrix with one metafounder; example from Legarra et al. [25].
I1
Computation of the inverse numerator relationship matrix with two metafounders; example from Legarra et al. [25].
J
Illustrates computation of H1 using an NRM with two metafounders. Example from github.com/alegarra/metafounders
K
As A, but adding a metafounder to A1, estimating γ using generalised least squares
L
As A, but adding multiple metafounders to A1, estimating Γ using the EM algorithm described by Garcia-Baccino et al. [16].

Example 21

Illustrates additional single step BLUP modules. Simulated data for two traits recorded on 600 animals; use marker allele counts for 1000 SNP for about a third of animals

A
SS-BLUP using run option --s1step. This requires matrix H1 as *.gin file which is not included (large); instead an additional *.par file is provided to set this up using run option --hinv.
A1
As A, but using run option --hdiag to obtain the diagonal elements of H (optional step 2) and --blup to obtain direct solutions with standard errors. If the *.hdiags file with diagonal elements of H (subsubsection 6.6.1.2) is found, corresponding accuracies are calculated.
B
SS-BLUP using run option --s2step (iteration on data). This requires the "add-on" in H1 (to A1) in binary form - this file is not included, but a *.par file is provided to compute it using run option --hinv.

Example 22

Illustrates post-BLUP/REML calculations to transform estimates of ‘breeding values’ into estimates of marker effects together with their standard errors, test statistics and p-values; see Gualdrón Duarte et al. [18] or Aguilar et al. [1].

The relevant documentation is supplied as a .pdf file with the example or can be downloaded as RNote_WOMBATssGWAS.pdf

Simulated data for a trait with heritability of 0.4; 130 animals with allele counts for 100 SNPs.

A
GBLUP analysis (all animals have genotypes) building the GRM using Van Raden’s Method 1 and centering markers using observed frequencies.
A1
As A, but using GWAS (EMMAX) via SNP Snappy (run option --snap); demonstrates equivalence of test statistics.
B
As A, but deleting 20% of genotypes at random to yield single step GWAS.