9 Worked examples

Chapter 9
Worked examples

A number of worked examples are provided to illustrate the use of WOMBAT and, in particular, show how to set up the parameter ﬁles.

Installation for the suite of examples is described in section 3.1.4. This generates the directory WOMBAT/Examples with subdirectories Examplen (n = 1,…,9).

Each subdirectory contains the data and pedigree ﬁles for a particular example, a ﬁle WhatIsIt with a brief description of the example, and one or subdirectories for individual runs, A, B, C, ….

Each ‘run’ directory (A, B, …) contains :

(a): A parameter ﬁle (.par)
(b): The ﬁle typescript (generated using the script command) which contains the screen output for the run.
Run time options used can be found at the top of this ﬁle.
(c): The numerous output ﬁles generated by WOMBAT.

N.B.: The example data sets have been selected for ease of demonstration, and to allow fairly rapid replication of the example runs. Clearly, most of the data sets used are too small to support estimation of all the parameters ﬁtted, in particular for the higher dimensional analyses shown !

N.B.: Examples runs have been carried out on a 64-bit machine (Intel I7 processor, rated at 3.20Ghz) ; numbers obtained on 32-bit machine may vary slightly.
Note further that all the example ﬁles are Linux ﬁles - you may need to ’translate’ them to DOS format if you plan to run the examples under Windows.

Example 1

This shows a univariate analysis for a simple animal model, ﬁtting a single ﬁxed eﬀect only.

Source: Simulated data; Example 1 from DfReml

Example 2

This shows a bivariate analysis for the case where the same model is ﬁtted for both traits and all traits are recorded for all animals. The model of analysis includes 3 cross-classiﬁed ﬁxed eﬀects and an additional random eﬀect

A: Analysis ﬁtting an animal model
B: Analysis ﬁtting a sire model

Source: Data from Edinburgh mouse lines; Example 2 from DfReml

Example 3

This example involves up to six repeated records for a single trait, recorded at diﬀerent ages. The model of analysis is an animal model with a single ﬁxed eﬀect. Data are analysed :

A: Fitting a univariate ‘repeatability’ model, with age as a covariable
B: Fitting a multivariate analysis with 6 traits
C: Fitting a univariate random regression model

Source: Wokalup selection experiment; Example 3 from DfReml

Example 4

This example shows a four-variate analysis for a simple animal model. Runs show:

A: A ‘standard’ full rank analysis
B: A reduced rank analysis, ﬁtting the ﬁrst two principal components only
C: A reduced rank analysis using the EM algorithm

Source: Australian beef cattle ﬁeld data

Example 5

Similar to example 4, but involving 6 traits. Runs show:

A: A ‘standard’ full rank analysis
B: A reduced rank analysis, ﬁtting the ﬁrst four principal components
C: An analysis ﬁtting a factor-analytic structure for the genetic covariance matrix
D: A full rank analysis, illustrating use of penalized REML (’bending’) for a chosen tuning parameter

Source: Australian beef cattle data

Example 6

This example involves 4 measurements, with records on diﬀerent sexes treated as diﬀerent traits. This gives an eight-variate analysis, with 16 residual covariances equal to zero. The model of analysis is a simple animal model.

A: A full rank analysis with ‘good’ starting values
B: A full rank analysis with ‘bad’ starting values

Source: Australian beef cattle ﬁeld data

Example 7

This example illustrates the analysis of 4 traits, subject to genetic and permanent environmental eﬀects. The model of analysis involves several crossclassiﬁed ﬁxed eﬀects, nested covariables and diﬀerent eﬀects for diﬀerent traits.

A: Univariate analysis for trait 1
B: Univariate analysis for trait 2
B1: As B but reworked by setting up NRM inverse externally to illustrate use of GIN option
B2: As B, but ﬁtting ﬁxed eﬀects only and a user-deﬁned basis function for the covariable dam age
C: Univariate analysis for trait 2, allowing for a non-zero direct-maternal genetic covariance
C1: As C but using GIN instead of NRM option
D: Bivariate analysis for traits 1 and 2
E: Bivariate analysis for traits 1 and 2, allowing for a non-zero direct-maternal genetic covariance
F: Trivariate analysis for traits 1, 2 and 3
G: Fourvariate analysis of all traits
H: Fourvariate analysis of all traits, not ﬁtting maternal eﬀects for trait 4
I: Reduced rank, fourvariate analysis of all traits, not ﬁtting maternal eﬀects for trait 4

Source: Wokalup selection experiment

Example 8

This is an example of a model where diﬀerent random eﬀects are ﬁtted for diﬀerent traits. It is a bivariate analysis of mature cow weights together with gestation length. Mature cow weight involves repeated records per animal, and a permanent environmental eﬀect of the animal is thus ﬁtted for this trait. Gestation length is treated as trait of the calf and assumed to be aﬀected by both genetic and permanent environmental eﬀects.

A: Standard model
B: Equivalent model, using the PEQ option for permanent environmental eﬀects of the animal.

Source: Australian beef cattle ﬁeld data

Example 9

This example illustrates random regression analyses ﬁtting an additional random eﬀect, using B-splines as basis functions and imposing rank restrictions on estimated covariance functions. Data are monthly records for weights of calves from birth to weaning.

A: Full rank analysis
A1: As A, but evaluating the basis functions externally to illustrate the use of user-deﬁned basis functions.
B: Reduced rank analysis
C: Reduced rank analysis with diﬀerent ranks

Source: Wokalup selection experiment

Example 10

In a recent paper, Wilson et al. [48] presented a tutorial on ‘animal model’ analyses of data from a wild population. This example replicates the analyses shown (and expands it by demonstrating how to evaluate likelihood ﬁxing the genetic covariance at zero, in order to carry out a likelihood ratio test for this component).

The tutorial and all data (& pedigree) ﬁles used in this example are not included – you have to download these from their web site:

http://wildanimalmodels.org

A: Simple univariate analysis
B: Bivariate analysis
B1: Bivariate analysis, ﬁxing the genetic covariance at zero
C: Repeated records model

Source: The Wild Animal Models Wiki

Example 11

This example demonstrates simple bi-variate random regression analyses.

A: Using a RR model to carry out a multivariate analysis with repeated records where the pattern of temporary environmental covariances is determined through the control variable.
A1: Multivariate analysis corresponding to A
B: Bi-variate RR analysis ﬁtting a quadratic regression on Legendre polynomials and homogeneous measurement error variances.

Source: Simulated records

Example 13

This example illustrates multivariate analyses with repeated records per trait, especially some new features:

WOMBAT now insists that the option RPTCOV is speciﬁed in the parameter ﬁle!
WOMBAT writes out a ﬁle RepeatedRecordsCounts with some basic information on how many animals have how many records.
There is now a mechanism – through RPTCOV TSELECT – to specify which records are measured at the same time and thus have a non-zero error covariance and which are not.
Trait numbers need to be assigned so that any traits with repeated records have a lower number than traits with single records.

The data were obtained by simulating records for 4 traits recorded on 800 animals at 5 diﬀerent times. A missing value indicator (999) is used to create diﬀerent pattern of missing records - note that analyses in the diﬀerent sub-directories analyze diﬀerent columns in the data ﬁle.

A

Demonstrates an analysis without missing records, i.e. where all traits are recorded at the same time. This implies that there are non-zero error covariances between all traits and that the ALIGNED option is appropriate.

B

Shows the analysis when some records are missing, but in a systematic fashion: Traits 1 and 2 have records at all 5 times, but traits 3 and 4 are only recorded for times 1 and 2. As the ‘missing’ observations only occur for the later times, the option ALIGNED is still appropriate.

C

Similar to B, but measurements for traits 3 and 4 are taken at times 2 and 4. This means that a time of recording indicator needs to be used to model the residual covariance structure correctly. This is done specifying TSELECT together with the name of the column in the data ﬁle which contains the time variable.

C1

As C, but using a multivariate random regression analysis.

D

Illustrates the scenario where we have a trait with repeated records analysed together with traits with single records and where traits with single and repeated records are measured at diﬀerent times so that

i): there are no error covariances between these groups of traits and
ii): that we can ‘use’ the error covariance to model covariances between traits due to permanent environmental eﬀects of the animal.

For this example, we use records taken at times 1 to 4 for trait 1, and records taken at time 5 for traits 2 to 4. For this case a model ﬁtting a permanent environmental eﬀects due to the animal for trait 1 only together with the INDESCR option is appropriate. Estimates of the error covariances betwen trait 1 and traits 2, 3 and 4 then reﬂect the permanent environmental covariance, while estimates of the (co)variances among the latter represent the sum of temporary and permanent environmental covariances.

E

Shows the case where we have a trait with repeated records analysed together with traits with single records, but where the single records are taken at the same time as one of the repeated records, so that we need to model non-zero error covariances. Here we consider records for trait 1 at all 5 times, and records for traits 2 to 3 taken at time 5. Again we need the TSELECT option to model this properly. In addition, we need to use the equivalent model invoked via the PEQ option in order to separate temporary and permanent envirionmental covariances between trait 1 and the other traits. Note that permanent environmental eﬀects are ﬁtted for all 4 traits, but that only the corresponding covariance components which can be disentangled from the environmental covariances are reported.

Source: Simulated records

Example 14

This gives a toy example illustrating the use of the run option --snap for a simple GWAS type analysis.

Brief notes are supplied as a .pdf ﬁle with the example or can be downloaded as RNote_WOMBATSnappy.pdf

A: Demonstrates run option --snap. This invokes estimation of SNP eﬀects and their standard errors for (co)variances assumed to be known. In addition to the data, pedigree and parameter ﬁle required for ’standard’ analyses, it requires a ﬁle supplying counts for the number of copies of the reference allele; this has the default name "SNPCounts.dat" and must be (or a link to it) in the working directory. The output ﬁle "SNPSolutions.dat" contains estimates of SNP eﬀects, their standard errors and resulting t-values, followed by their p-value derived from the Normal density and log 10 of the latter (one per line).
B: As A, but for a bivariate analysis (2 traits, ﬁt a single SNP). The output ﬁle SNPSolutions.dat contains 6 values per line: estimates of the SNP eﬀects for traits 1 and 2, their standard errors and resulting t-values.
C: As A, but for two SNPs (no p-values).
D: Ditto, but two SNPs & two traits
E: As A with the addition of ‘explicit’ genetic groups

Source: Simulated records

Example 15

This example illustrates how to pool estimates of covariance components by (penalized) maximum likelihood.

Some notes are supplied as a .pdf ﬁle with the example or can be downloaded as RNote_WOMBATPool.pdf

The example is comprised of 14 traits, with 6 traits measured on few animals and the remaining records representing 4 traits with measures on males and females treated as diﬀerent traits [from 32]. There are results from 76 bivariate and one six-variate analysis to be combined. Due to traits measured on diﬀerent subsets of animals (sexes) there are a number of residual covariances which are to be ﬁxed at zero.

A: All part analyses have been carried out using WOMBAT and a ‘full’ parameter ﬁle is available.
B: Results from part analyses are summarized in a single ﬁle and a ‘minimum’ parameter ﬁle is used.

Example 16

This example shows what options are available in WOMBAT to ﬁt ‘social interaction’ type models.

A: This directory shows an example run for simulated data for a dilution factor of 0, treating direct and social genetic eﬀects as uncorrelated.
B: As A, but allowing for a non-zero genetic covariance
C: As B, but treating social group and residual variances as heterogeneous.
D: Fitting the same model as in B to data simulated with a non-zero dilution factor, this directory shows the multiple runs required to estimated this factor using a quadratic approximation to the proﬁle likelihood.
Z: Larry Schaeﬀer has some notes which contain a very simple example. This subdirectory shows how to obtain the BLUP solutions given using WOMBAT. www.aps.uoguelph.ca/ lrs/ABModels/NOTES/SSocial.pdf

Example 17

This example shows how WOMBAT can be used to approximate the sampling distribution of estimates by sampling from their asymptotic, multivariate normal distribution, as described by Meyer and Houle [34].

A: Sample estimates for the genetic covariance matrix in a 5-trait analysis (Note that 100 samples is used for illustration only - more should be used in practice!)

Example 18

This example illustrates the use of the single-step option, --s1step

A: Univariate analysis
B: As A, using –blup
C: Bivariate analysis
D: Univariate analysis, ﬁtting "explicit" genetic groups

Example 19

This example illustrates the use of simple penalties to reduce sampling variances in REML estimation.

Brief notes are supplied as a .pdf ﬁle with the example or can be downloaded as RNote_WOMBATSimplePenalty.pdf

0: Parameter ﬁle to simulate data
A: Standard unpenalized analysis
B: Unpenalized analysis, parameterised to elements of the canonical decomposition
C: Penalty on canonical eigenvalues with ESS=8
D: Penalty on genetic partial correlations with ESS=8, shrinking towards phenotypic values
Z: Script to simulate replicates & Fortran code to summarize results

Example 20

This example shows some of the options available for a pre-analysis step with run option --hinv to calculate relationship matrices and accompanying statistics required for genomic evaluation. Speciﬁcally:

a): the genomic relationship matrix (GRM) and its inverse,
b): the submatrix of the numerator relationship matrix (NRM) for genotyped animals and its inverse,
c): the inverse of the NRM including metafounders, and
d): the inverse of the combined relationship for single step evaluation (H⁻¹).

It has grown to provide a substantial number of options. The relevant documentation is supplied as a .pdf ﬁle with the example or can be downloaded as RNote_WOMBATHinverse.pdf

A: Calculation of the complete H-inverse for simulated data with 370 genotypes and 750 animals in total.
A1: As A, but with scaling of the GRM as described by Christensen [7]
A2: As A, but estimating base population frequencies to center allele counts.
A3: Shows calculation of the diagonal elements of H for A, using special run option --hdiag. This inverts H⁻¹ using sparse matrix inversion and writes out the diagonal elements for all individuals.
A4: Shows calculation of the diagonal elements of H for A as an optional extra using Method 3 of Legarra et al. [23].
B: Data as A. Illustrates calculation of the GRM and GRM-Inverse only.
B1: As B but with eigen-decomposition of the GRM.
C: Data as A. Illustrates calculation of A₂₂ only.
D: As A, but augmenting H⁻¹ by equations for "implicit" genetic groups (Group proportions to be read from pedigree ﬁle).
D1: As D, but carrying through genetic group proportions needed to ﬁt genetic groups explicitly to *.codes ﬁle.
D2: As D, but reading unknown parent group (UPG) from colums 5 and 6 in pedigree ﬁle (must be coded 1 to number of groups) and writing them to colums 4 and 5 in corresponding *.codes ﬁle.
E: Data as A. Illustrates calculation of the GRM-Inverse only using the Woodbury formula.
F: Data as A. Illustrates calculation of H⁻¹ using the Woodbury formula to obtain the inverse of the GRM.
G: As B, but approximating the GRM inverse using the APY algorithm
H: As A, but approximating H⁻¹ using the APY algorithm
I: Shows computation of the inverse numerator relationship matrix with one metafounder; example from Legarra et al. [24].
I1: Computation of the inverse numerator relationship matrix with two metafounders; example from Legarra et al. [24].
J: Illustrates computation of H⁻¹ using an NRM with two metafounders. Example from github.com/alegarra/metafounders
K: As A, but adding a metafounder to A⁻¹, estimating γ using generalised least squares
L: As A, but adding multiple metafounders to A⁻¹, estimating Γ using the EM algorithm described by Garcia-Baccino et al. [15].

Example 21

Illustrates additional single step BLUP modules. Simulated data for two traits recorded on 600 animals; use marker allele counts for 1000 SNP for about a third of animals

A: SS-BLUP using run option --s1step. This requires matrix H⁻¹ as *.gin ﬁle which is not included (large); instead an additional *.par ﬁle is provided to set this up using run option --hinv.
A1: As A, but using run option --hdiag to obtain the diagonal elements of H (optional step 2) and --blup to obtain direct solutions with standard errors. If the *.hdiags ﬁle with diagonal elements of H (subsubsection 6.6.1.2) is found, corresponding accuracies are calculated.
B: SS-BLUP using run option --s2step (iteration on data). This requires the "add-on" in H⁻¹ (to A⁻¹) in binary form - this ﬁle is not included, but a *.par ﬁle is provided to compute it using run option --hinv.

Example 22

Illustrates post-BLUP/REML calculations to transform estimates of ‘breeding values’ into estimates of marker eﬀects together with their standard errors, test statistics and p-values; see Gualdrón Duarte et al. [17] or Aguilar et al. [1].

The relevant documentation is supplied as a .pdf ﬁle with the example or can be downloaded as RNote_WOMBATssGWAS.pdf

Simulated data for a trait with heritability of 0.4; 130 animals with allele counts for 100 SNPs.

A: GBLUP analysis (all animals have genotypes) building the GRM using Van Raden’s Method 1 and centering markers using observed frequencies.
A1: As A, but using GWAS (EMMAX) via SNP Snappy (run option --snap); demonstrates equivalence of test statistics.
B: As A, but deleting 20% of genotypes at random to yield single step GWAS.

[next] [prev] [prev-tail] [front] [up]

Chapter 9Worked examples

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Example 7

Example 8

Example 9

Example 10

Example 11

Example 13

Example 14

Example 15

Example 16

Example 17

Example 18

Example 19

Example 20

Example 21

Example 22

Chapter 9
Worked examples