6.5 Other Files

Depending on the model of analysis chosen, additional input files may be required.

6.5.1 General inverse file

For each random effect fitted for which the covariance option GIN (see has been specified, WOMBAT expects a file set up by the user which contains the inverse of the matrix (such as relationship or correlation matrix) which determines the ‘structure’ of the covariance matrix for the random effect. The following rules apply :

The file name should be equal to the name of the random effect, with the extension .gin. For example, mother.gin for a random effect called mother.
For random effect names containing additional information in round brackets, for instance in RR analysis, only the part preceding the ‘(’ should be used. In this case, be careful to name the effects in the model so that no ambiguities arise!
The first line of the file should contain a real variable with value equal to the log determinant of the covariance/general relationship matrix (NB: This is the log determinant of the matrix, not of the inverse; this can generally be calculated as a ‘by-product’ during inversion).
This comprises a constant term in the (log) likelihood, i.e. any value can be given (e.g. zero) if no comparisons between models are required.
Optionally,  this can be followed (separated by space(s)) by the keyword “DENSE”. If given, WOMBAT will store the elements of the general relationship matrix in core, assuming it is dense, i.e. for n  levels, an array of size n(n+ 1)∕2  is used. This can require substantial additional memory, but reduces the overhead incurred by re-reading this matrix from disk for every iteration, and may be advantageous if the matrix is (almost) dense, such as the inverse of a genomic relationship matrix.
The file should then contain one line for each non-zero element in the inverse. Each line is expected to contain three space-separated variables :
An integer code for the ‘column’ number
An integer code for the ‘row’ number
A real variable specifying the element of the inverse

Here ‘row’ and ‘column’ numbers should range from 1  to N  , where N  is the number of levels for the random effect.
Only the elements of the lower triangle of the inverse should be given and given ‘row-wise’, i.e. WOMBAT expects a ’column’ number which is less than or equal to the ‘row’ number. Codes for GIN levels

By default, WOMBAT determines the number of levels for a random effect with covariance option GIN from the data, renumbering them in ascending numerical order. In some cases, however, we might want to fit additional levels, not represented in the data. A typical example is am additional genetic effect, which can have levels not in the data linked to those in the data through covariances arising from co-ancestry.

If WOMBAT encounters row or column numbers greater than the number of random effect levels found in the data, it will take the following action:

It is checked that this number does not exceed the maximum number of random effects levels as specified in the parameter file. If it does, WOMBAT stops (change parameter file if necessary).
WOMBAT looks for a file with the same name as the .gin file but extension .codes; e.g. mother.codes  for the random effect mother. This file is expected to supply the codes for all levels of the random effect: There has to be one line for each level with two space separated integer variables, the running number (1st) and the code for the level (2nd).
If such file is not found, WOMBAT will look for a genetic effect (i.e. a random effect with covariance option NRM) which has the same number of levels as the current random effect. If found, it will simply copy the vector of identities for that effect and proceed. (Hint: you may have to use run time --noprune to utilise this feature).
Finally, if neither of these scenarios apply, WOMBAT will assume the random levels are coded from 1  to N  and try to proceed without any further checking – this may cause problems!

6.5.2 Basis function file

If a regression on a user- defined set of basis functions has been chosen in the model of analysis by specifying the code USR for a covariable (or ‘control’ variable in a RR analysis), file(s) specifying the functions need to be supplied.

The form required for these files is:

The name of the file should be the name of the covariable (or ‘control’ variable), as given in the parameter file (model of analysis part), followed by _USR, the number of coefficients, and the extension .baf.

EXAMPLE: If the model of analysis includes the effect age and the maximum number of regression coefficients for age is 7, the corresponding input file expected is age_USR7.baf

N.B.: The file name does not include a trait number.

This implies, that for multivariate analyses the same basis function is assumed to be used for a particular covariable across all traits. The only differentiation allowed is that the number of regression coefficients may be different (i.e. that a subset of coefficients may be fitted for some traits); in this case, the file supplied must correspond to the largest number of coefficients specified.

There should be one row for each value of the covariable.
Rows should correspond to values of the covariable in ascending order.
The number of columns in the file must be equal to (or larger than) the number of regression coefficients to be fitted (i.e. the order of fit) for the covariable.
The elements of the i− th row should be the user-defined functions evaluated for the i− th value of the covariable.

EXAMPLE: Assume the covariable has possible values of 1, 3, 5, 7 and 9, and that we want to fit a cubic regression on ’ordinary’ polynomials, including the intercept. In this case, WOMBAT would expect to find a file with 5 rows (corresponding to the 5 values of the covariable) and 4 columns (corresponding to the 4 regression coefficients, i.e. intercept, linear, quadratic and cubic):

  1   1   1    1
  1   3   9   27
  1   5  25  125
  1   7  49  343
  1   9  81  729

Note that there is no leading column with the value of the covariable (you can add it as the last column which is ignored by WOMBAT, if you wish) – the association between value of covariable and user defined function is made through the order of records.

6.5.3 File with allele counts

For an analysis using the run option --snap, an additional input file is required which supplies the counts for the reference allele for each QTL or SNP to be considered. This has the default name QTLAllels.dat or QTLAllelsR.dat, depending whether integer or or real input is chosen. If both exist in the working directory, WOMBAT will utilize the former and ignore the latter.

6.5.4 Files with results from part analyses List of partial results

For a run with option --itsum or --pool, WOMBAT expects a number of files with results from part analyses as input. Typically, these have been generated by WOMBAT when carrying out these analyses; see 7.2.6 for further details. Single, user generated input file

For run option --pool, results can be given in a single file instead. For each part analysis, this should contain the following information:

A line giving (space separated):
The number of traits in the part analysis
The (running) numbers of these traits in the full covariance matrix.
The relative weight to be given to this part; this can be omitted and, if not given, is set to 1.
The elements of the upper triangle of the residual covariance matrix, given row-wise.
For each random effect fitted, the elements of the upper triangle, given row-wise. Each matrix must begin on a new line and the matrices must given in the same order as the corresponding VAR statements in the parameter file.

6.5.5 ‘Utility’ files

WOMBAT will check for existence of other files with default names in the working directory and, if they exist, acquire information from them. File RunOptions

This file can be used as an alternative to the command line to specify run options (see 5).
It must have one line for each run option specified, e.g.
to specify a run with verbose output using the EM-algorithm. File FileSynonyms

In some cases, WOMBAT expects input files with specific names. If files with different default names have the same content, duplication can be avoided by setting up a file FileSynonyms to ‘map’ specific files to a single input file. This file should contain one line for each input file to be ‘mapped’ to another file. Each line should give two file names (space separated) :

The default name expected by WOMBAT.
The name of the replacement file


age.baf      mybasefn.dat
damage.baf   mybasefn.dat

[Not yet implemented !] File RandomSeeds

To simulate data, WOMBAT requires two integer values to initialise the random number generator. If the file RandomSeeds exists, it will attempt to read these values from it. Both numbers can be specified on the same or different lines. If the file does not exist in the working directory, or if an error reading is encountered, initial numbers are instead derived from the date and time of day.

WOMBAT writes out such file in each simulation run, i.e. if RandomSeeds exists, it is overwritten with a new pair of numbers !

6.5.6 File SubSetsList

For a run with option --itsum, WOMBAT expects to read a list of names of files with results from subset analyses in a file with the standard name SubSetsList. This has generated by WOMBAT (see 7.3.9) if the part analyses have been carried out using WOMBAT, but may need editing. In particular, if a weighted summation is required, the default weights of ‘1.000’, need to be replaced ‘manually’ by appropriate values, selected by the user !

6.5.7 File(s) Pen*(.dat) File PenTargetMatrix

For penalty options COVARM and CORREL a file with this name must be supplied which gives the shrinkage target. This must be a positive definite matrix. The file should be a plain text file and contain the elements of the upper triangle of the matrix. It is read in ‘free’ format, i.e. variable numbers of elements per line are allowed. File PenBestPoints.dat

A run with the option --valid expects to read sets of estimates from a file with this name. This is generated by WOMBAT when penalized estimation is specified, but can be edited to suit or generated by other means. For each tuning factor, it should contain:

A line with the tuning factor (realvariable) at the beginning
The elements of the upper triangle of estimate the residual covariance matrix (or equivalent) for this tuning factor. This is read in ‘free’ format, i.e. can be given over as many lines suitable.
Starting on a new line: The elements of the upper triangle of estimate the genetic covariance matrix (or equivalent) for this tuning factor. Again, this is read in ‘free’ format.