Chapter 9
Worked examples
A number of worked examples are provided to illustrate the use of WOMBAT and,
in particular, show how to set up the parameter files.
Installation for the suite of examples is described in section 3.1.4 . This
generates the directory WOMBAT/Examples with subdirectories Examplen
(n = 1,…,9).
Each subdirectory contains the data and pedigree files for a particular example, a file
WhatIsIt with a brief description of the example, and one or subdirectories for
individual runs, A, B, C, ….
Each ‘run’ directory (A, B, …) contains :
-
(a)
- A parameter file (.par)
-
(b)
- The file typescript (generated using the script command) which
contains the screen output for the run.
Run time options used can be found at the top of this file.
-
(c)
- The numerous output files generated by WOMBAT.
N.B.: The example data sets have been selected for ease of
demonstration, and to allow fairly rapid replication of the
example runs. Clearly, most of the data sets used are too small
to support estimation of all the parameters fitted, in particular
for the higher dimensional analyses shown !
N.B.: Examples runs have been carried out on a 64-bit machine
(Intel I7 processor, rated at 3.20Ghz) ; numbers obtained on
32-bit machine may vary slightly.
Note further that all the example files are Linux files - you
may need to ’translate’ them to DOS format if you plan to run
the examples under Windows.
Example 1
This shows a univariate analysis for a simple animal model, fitting a single fixed
effect only.
Source: Simulated data; Example 1 from DfReml
Example 2
This shows a bivariate analysis for the case where the same model is fitted for both
traits and all traits are recorded for all animals. The model of analysis includes 3
cross-classified fixed effects and an additional random effect
-
A
- Analysis fitting an animal model
-
B
- Analysis fitting a sire model
Source: Data from Edinburgh mouse lines; Example 2 from DfReml
Example 3
This example involves up to six repeated records for a single trait, recorded at
different ages. The model of analysis is an animal model with a single fixed effect.
Data are analysed :
-
A
- Fitting a univariate ‘repeatability’ model, with age as a covariable
-
B
- Fitting a multivariate analysis with 6 traits
-
C
- Fitting a univariate random regression model
Source: Wokalup selection experiment; Example 3 from DfReml
Example 4
This example shows a four-variate analysis for a simple animal model. Runs
show:
-
A
- A ‘standard’ full rank analysis
-
B
- A reduced rank analysis, fitting the first two principal components only
-
C
- A reduced rank analysis using the EM algorithm
Source: Australian beef cattle field data
Example 5
Similar to example 4, but involving 6 traits. Runs show:
-
A
- A ‘standard’ full rank analysis
-
B
- A reduced rank analysis, fitting the first four principal components
-
C
- An analysis fitting a factor-analytic structure for the genetic covariance
matrix
-
D
- A full rank analysis, illustrating use of penalized REML (’bending’) for a
chosen tuning parameter
Source: Australian beef cattle data
Example 6
This example involves 4 measurements, with records on different sexes treated as
different traits. This gives an eight-variate analysis, with 16 residual covariances
equal to zero. The model of analysis is a simple animal model.
-
A
- A full rank analysis with ‘good’ starting values
-
B
- A full rank analysis with ‘bad’ starting values
Source: Australian beef cattle field data
Example 7
This example illustrates the analysis of 4 traits, subject to genetic and permanent
environmental effects. The model of analysis involves several crossclassified fixed
effects, nested covariables and different effects for different traits.
-
A
- Univariate analysis for trait 1
-
B
- Univariate analysis for trait 2
-
B1
- As B but reworked by setting up NRM inverse externally to illustrate use
of GIN option
-
B2
- As B, but fitting fixed effects only and a user-defined basis function for the
covariable dam age
-
C
- Univariate analysis for trait 2, allowing for a non-zero direct-maternal genetic
covariance
-
C1
- As C but using GIN instead of NRM option
-
D
- Bivariate analysis for traits 1 and 2
-
E
- Bivariate analysis for traits 1 and 2, allowing for a non-zero direct-maternal
genetic covariance
-
F
- Trivariate analysis for traits 1, 2 and 3
-
G
- Fourvariate analysis of all traits
-
H
- Fourvariate analysis of all traits, not fitting maternal effects for trait 4
-
I
- Reduced rank, fourvariate analysis of all traits, not fitting maternal effects
for trait 4
Source: Wokalup selection experiment
Example 8
This is an example of a model where different random effects are fitted for different
traits. It is a bivariate analysis of mature cow weights together with gestation length.
Mature cow weight involves repeated records per animal, and a permanent
environmental effect of the animal is thus fitted for this trait. Gestation length is
treated as trait of the calf and assumed to be affected by both genetic and permanent
environmental effects.
-
A
- Standard model
-
B
- Equivalent model, using the PEQ option for permanent environmental effects
of the animal.
Source: Australian beef cattle field data
Example 9
This example illustrates random regression analyses fitting an additional random
effect, using B-splines as basis functions and imposing rank restrictions on estimated
covariance functions. Data are monthly records for weights of calves from birth to
weaning.
-
A
- Full rank analysis
-
A1
- As A, but evaluating the basis functions externally to illustrate the use of
user-defined basis functions.
-
B
- Reduced rank analysis
-
C
- Reduced rank analysis with different ranks
Source: Wokalup selection experiment
Example 10
In a recent paper, Wilson et al. [49] presented a tutorial on ‘animal model’
analyses of data from a wild population. This example replicates the analyses
shown (and expands it by demonstrating how to evaluate likelihood fixing the
genetic covariance at zero, in order to carry out a likelihood ratio test for this
component).
The tutorial and all data (& pedigree) files used in this example are not included –
you have to download these from their web site:
http://wildanimalmodels.org
-
A
- Simple univariate analysis
-
B
- Bivariate analysis
-
B1
- Bivariate analysis, fixing the genetic covariance at zero
-
C
- Repeated records model
Source: The Wild Animal Models Wiki
Example 11
This example demonstrates simple bi-variate random regression analyses.
-
A
- Using a RR model to carry out a multivariate analysis with repeated records
where the pattern of temporary environmental covariances is determined
through the control variable.
-
A1
- Multivariate analysis corresponding to A
-
B
- Bi-variate RR analysis fitting a quadratic regression on Legendre polynomials
and homogeneous measurement error variances.
Source: Simulated records
Example 13
This example illustrates multivariate analyses with repeated records per trait,
especially some new features:
- WOMBAT now insists that the option RPTCOV is specified in the
parameter file!
- WOMBAT writes out a file RepeatedRecordsCounts with some basic
information on how many animals have how many records.
- There is now a mechanism – through RPTCOV TSELECT – to specify which
records are measured at the same time and thus have a non-zero error
covariance and which are not.
- Trait numbers need to be assigned so that any traits with repeated records
have a lower number than traits with single records.
The data were obtained by simulating records for 4 traits recorded on 800 animals at
5 different times. A missing value indicator (999) is used to create different pattern of
missing records - note that analyses in the different sub-directories analyze different
columns in the data file.
-
A
- Demonstrates an analysis without missing records, i.e. where all traits are
recorded at the same time. This implies that there are non-zero error
covariances between all traits and that the ALIGNED option is appropriate.
-
B
- Shows the analysis when some records are missing, but in a systematic
fashion: Traits 1 and 2 have records at all 5 times, but traits 3 and 4 are
only recorded for times 1 and 2. As the ‘missing’ observations only occur for
the later times, the option ALIGNED is still appropriate.
-
C
- Similar to B, but measurements for traits 3 and 4 are taken at times 2 and
4. This means that a time of recording indicator needs to be used to model
the residual covariance structure correctly. This is done specifying TSELECT
together with the name of the column in the data file which contains the
time variable.
-
C1
- As C, but using a multivariate random regression analysis.
-
D
- Illustrates the scenario where we have a trait with repeated records analysed
together with traits with single records and where traits with single and repeated
records are measured at different times so that
-
i)
- there are no error covariances between these groups of traits and
-
ii)
- that we can ‘use’ the error covariance to model covariances between
traits due to permanent environmental effects of the animal.
For this example, we use records taken at times 1 to 4 for trait 1, and
records taken at time 5 for traits 2 to 4. For this case a model fitting
a permanent environmental effects due to the animal for trait 1 only
together with the INDESCR option is appropriate. Estimates of the error
covariances betwen trait 1 and traits 2, 3 and 4 then reflect the permanent
environmental covariance, while estimates of the (co)variances among the
latter represent the sum of temporary and permanent environmental
covariances.
-
E
- Shows the case where we have a trait with repeated records analysed together
with traits with single records, but where the single records are taken at the same
time as one of the repeated records, so that we need to model non-zero error
covariances. Here we consider records for trait 1 at all 5 times, and records for
traits 2 to 3 taken at time 5. Again we need the TSELECT option to model this
properly. In addition, we need to use the equivalent model invoked via the PEQ
option in order to separate temporary and permanent envirionmental covariances
between trait 1 and the other traits. Note that permanent environmental effects
are fitted for all 4 traits, but that only the corresponding covariance
components which can be disentangled from the environmental covariances are
reported.
Source: Simulated records
Example 14
This gives a toy example illustrating the use of the run option --snap for a simple
GWAS type analysis.
Brief notes are supplied as a .pdf file with the example or can be downloaded as
RNote_WOMBATSnappy.pdf
-
A
- Demonstrates run option --snap. This invokes estimation of SNP effects
and their standard errors for (co)variances assumed to be known. In
addition to the data, pedigree and parameter file required for ’standard’
analyses, it requires a file supplying counts for the number of copies
of the reference allele; this has the default name "SNPCounts.dat" and
must be (or a link to it) in the working directory. The output file
"SNPSolutions.dat" contains estimates of SNP effects, their standard
errors and resulting t-values, followed by their p-value derived from the
Normal density and log 10 of the latter (one per line).
-
B
- As A, but for a bivariate analysis (2 traits, fit a single SNP). The output
file SNPSolutions.dat contains 6 values per line: estimates of the SNP
effects for traits 1 and 2, their standard errors and resulting t-values.
-
C
- As A, but for two SNPs (no p-values).
-
D
- Ditto, but two SNPs & two traits
-
E
- As A with the addition of ‘explicit’ genetic groups
Source: Simulated records
Example 15
This example illustrates how to pool estimates of covariance components by
(penalized) maximum likelihood.
Some notes are supplied as a .pdf file with the example or can be downloaded as
RNote_WOMBATPool.pdf
The example is comprised of 14 traits, with 6 traits measured on few animals and the
remaining records representing 4 traits with measures on males and females treated
as different traits [from 33]. There are results from 76 bivariate and one six-variate
analysis to be combined. Due to traits measured on different subsets of animals
(sexes) there are a number of residual covariances which are to be fixed at
zero.
-
A
- All part analyses have been carried out using WOMBAT and a ‘full’
parameter file is available.
-
B
- Results from part analyses are summarized in a single file and a ‘minimum’
parameter file is used.
Example 16
This example shows what options are available in WOMBAT to fit ‘social
interaction’ type models.
-
A
- This directory shows an example run for simulated data for a dilution factor
of 0, treating direct and social genetic effects as uncorrelated.
-
B
- As A, but allowing for a non-zero genetic covariance
-
C
- As B, but treating social group and residual variances as heterogeneous.
-
D
- Fitting the same model as in B to data simulated with a non-zero dilution
factor, this directory shows the multiple runs required to estimated this factor
using a quadratic approximation to the profile likelihood.
-
Z
- Larry Schaeffer has some notes which contain a very simple example.
This subdirectory shows how to obtain the BLUP solutions given using
WOMBAT. www.aps.uoguelph.ca/ lrs/ABModels/NOTES/SSocial.pdf
Example 17
This example shows how WOMBAT can be used to approximate the sampling
distribution of estimates by sampling from their asymptotic, multivariate normal
distribution, as described by Meyer and Houle [35].
-
A
- Sample estimates for the genetic covariance matrix in a 5-trait analysis (Note
that 100 samples is used for illustration only - more should be used in
practice!)
Example 18
This example illustrates the use of the single-step option, --s1step
-
A
- Univariate analysis
-
B
- As A, using –blup
-
C
- Bivariate analysis
-
D
- Univariate analysis, fitting "explicit" genetic groups
Example 19
This example illustrates the use of simple penalties to reduce sampling variances in
REML estimation.
Brief notes are supplied as a .pdf file with the example or can be downloaded as
RNote_WOMBATSimplePenalty.pdf
-
0
- Parameter file to simulate data
-
A
- Standard unpenalized analysis
-
B
- Unpenalized analysis, parameterised to elements of the canonical
decomposition
-
C
- Penalty on canonical eigenvalues with ESS=8
-
D
- Penalty on genetic partial correlations with ESS=8, shrinking towards
phenotypic values
-
Z
- Script to simulate replicates & Fortran code to summarize results
Example 20
This example shows some of the options available for a pre-analysis step with run
option --hinv to calculate relationship matrices and accompanying statistics
required for genomic evaluation. Specifically:
-
a)
- the genomic relationship matrix (GRM) and its inverse,
-
b)
- the submatrix of the numerator relationship matrix (NRM) for genotyped
animals and its inverse,
-
c)
- the inverse of the NRM including metafounders, and
-
d)
- the inverse of the combined relationship for single step evaluation (H−1).
It has grown to provide a substantial number of options. The relevant
documentation is supplied as a .pdf file with the example or can be downloaded as
RNote_WOMBATHinverse.pdf
-
A
- Calculation of the complete H-inverse for simulated data with 370
genotypes and 750 animals in total.
-
A1
- As A, but with scaling of the GRM as described by Christensen [7]
-
A2
- As A, but estimating base population frequencies to center allele counts.
-
A3
- Shows calculation of the diagonal elements of H for A, using special
run option --hdiag. This inverts H−1 using sparse matrix inversion and
writes out the diagonal elements for all individuals.
-
A4
- Shows calculation of the diagonal elements of H for A as an optional
extra using Method 3 of Legarra et al. [24].
-
B
- Data as A. Illustrates calculation of the GRM and GRM-Inverse only.
-
B1
- As B but with eigen-decomposition of the GRM.
-
C
- Data as A. Illustrates calculation of A22 only.
-
D
- As A, but augmenting H−1 by equations for "implicit" genetic groups
(Group proportions to be read from pedigree file).
-
D1
- As D, but carrying through genetic group proportions needed to fit genetic
groups explicitly to *.codes file.
-
D2
- As D, but reading unknown parent group (UPG) from colums 5 and 6 in
pedigree file (must be coded 1 to number of groups) and writing them to
colums 4 and 5 in corresponding *.codes file.
-
E
- Data as A. Illustrates calculation of the GRM-Inverse only using the
Woodbury formula.
-
F
- Data as A. Illustrates calculation of H−1 using the Woodbury formula
to obtain the inverse of the GRM.
-
G
- As B, but approximating the GRM inverse using the APY algorithm
-
H
- As A, but approximating H−1 using the APY algorithm
-
I
- Shows computation of the inverse numerator relationship matrix with one
metafounder; example from Legarra et al. [25].
-
I1
- Computation of the inverse numerator relationship matrix with two
metafounders; example from Legarra et al. [25].
-
J
- Illustrates computation of H−1 using an NRM with two metafounders.
Example from github.com/alegarra/metafounders
-
K
- As A, but adding a metafounder to A−1, estimating γ using generalised
least squares
-
L
- As A, but adding multiple metafounders to A−1, estimating Γ using the
EM algorithm described by Garcia-Baccino et al. [16].
Example 21
Illustrates additional single step BLUP modules. Simulated data for two traits
recorded on 600 animals; use marker allele counts for 1000 SNP for about a third of
animals
-
A
- SS-BLUP using run option --s1step. This requires matrix H−1 as *.gin
file which is not included (large); instead an additional *.par file is
provided to set this up using run option --hinv.
-
A1
- As A, but using run option --hdiag to obtain the diagonal
elements of H (optional step 2) and --blup to obtain direct solutions
with standard errors. If the *.hdiags file with diagonal elements
of H (subsubsection 6.6.1.2) is found, corresponding accuracies are
calculated.
-
B
- SS-BLUP using run option --s2step (iteration on data). This requires
the "add-on" in H−1 (to A−1) in binary form - this file is not included,
but a *.par file is provided to compute it using run option --hinv.
Example 22
Illustrates post-BLUP/REML calculations to transform estimates of ‘breeding
values’ into estimates of marker effects together with their standard errors,
test statistics and p-values; see Gualdrón Duarte et al. [18] or Aguilar
et al. [1].
The relevant documentation is supplied as a .pdf file with the example or can be
downloaded as RNote_WOMBATssGWAS.pdf
Simulated data for a trait with heritability of 0.4; 130 animals with allele counts for
100 SNPs.
-
A
- GBLUP analysis (all animals have genotypes) building the GRM using
Van Raden’s Method 1 and centering markers using observed frequencies.
-
A1
- As A, but using GWAS (EMMAX) via SNP Snappy (run option --snap);
demonstrates equivalence of test statistics.
-
B
- As A, but deleting 20% of genotypes at random to yield single step GWAS.