 Visitor number 9032 since May 22, 2012

# Pooling covariance components

A standard task in quantitative genetics is the estimation of covariance matrices, often for many traits and multiple sources of variation. Frequently, full multivariate analyses for all traits of interest are computationally not feasible, and estimates for all components of interest are obtained from analyses by parts. Typically, this comprises numerous low-dimensional analyses of overlapping subsets of traits. In the simplest scenario, these may be bivariate analyses for all pairs of traits. In other cases, e.g. when some traits are subject to selection based on other traits of interest, tri- or higher-variate analyses may be needed to counter-act selection bias.

The task then is to combine -- or 'pool' -- the resulting estimates into overall covariance matrices for all traits. The main problems inherent in this are:

• The resulting matrices need to be positive (semi-) definite, i.e. do not have negative eigenvalues.
• The elements of the overall matrices should be 'near' to estimates from individual analyses. In particular, the phenotypic components should not be distorted.
• Different part analyses may involve very different numbers of observations, and results should thus be weighted differently.

### The penalized likelihood approach

We propose a likelihood approach to combine estimates for all sources of variation simultaneously. In brief, this relies on treating estimates from individual part analyses as data, i.e. matrices of corrected mean squares and cross-products. As such, it is similar to the methods suggested by Mantysaari (1999) (as implemented in PDMATRIX) and Thompson et al. (2005). However, there are two improvements:
1. Estimates for different sources of variation from the same part analysis are combined assuming a simple pseudo pedigree structure, and all covariance matrices are pooled at the same time. Doing so approximates sampling covariances between estimates from the same analysis, and thus restricts changes in their sum, i.e. the phenotypic components.
2. We can maximize the likelihood imposing a penalty aimed at 'borrowing strength' from the estimate of the phenotypic covariance matrix. As shown for full, multivariate analyses, this can result in substantially reduced sampling variation, and thus result in estimates with are, on average, closer to the population values.

### Software

The penalized likelihood approach has been implemented as an add-on to our software package WOMBAT. It is invoked using the run option --pool. There are two forms of input, one to combine part results obtained using WOMBAT, and a 'general' form suitable for estimates from any source. Details are described in the WOMBAT User Manual.

There is a set of notes describing the specific options available in conjunction with pooling (e.g. to select the pseudo pedigree structures or penalty to be imposed), and a worked example.