With a maximum likelihood framework of estimation, tests based on the (log) likelihood are the obvious choice for hypothesis testing.
REML, short for restricted or residual maximum likelihood, estimation involves maximising the likelihood of the residuals, i.e the observations adjusted for estimates of the fixed effects fitted. In this way, REML acounts for the degrees of freedom used by fitting fixed effects - in contrast to `full' maximum likelihood which does not and thus yields underestimates of the residual variance.
However, this does imply that the REML likelihood does not contain any information about the fixed effects fitted. Consequently, only models (with nested random effects) which fit exactly the same fixed effects can be compared using likelihood based tests.
Say we have a vector of parameters (i.e. (co)variance components or functions of such components) and their REML estimates , and let denote the corresponding, maximum likelihood value. Partition into the vector of p parameters we want to test, , and the remaining q parameters, (re-ordering if necessary). Further, let the null hypothesis to be tested be
with the alternative hypothesis
To carry out a likelihood ratio test, we need to find the REML estimates of fixing at the `test values' , and the corresponding conditional maximum likelihood value .
The likelihood ratio test criterion is the computed as
Asymptotically, has a distribution with p degrees of freedom. Hence, is rejected if exceeds the value from the distribution for p degrees of freedom and a chosen error probability, and is accepted otherwise. For example for and an error probability of 5%, needs to exceed 3.84 for to be accepted.
Note that the use of likelihood ratio tests for model comparison is only valid for nested models.
Note further that the alternative hypothesis implies a two-sided test. Hence it is not directly applicable for tests at the boundary of the parameter space, e.g. to test whether a variance component is greater than zero or not. In this case, is distributed as a mixture of variables, with details depending on the values of p and q and how many of the elements of and are at the boundary. Details are given by Self and Liang (1987); see also Domenicus et al. (2006) or Visscher (2006) for discussions in a genetics (variance component) context and suitable adjustments in simple cases.
Likelihood ratio tests are known to favour more detailed models. Hence, for scenarios with many parameters, model comparisons are often based on the likelihood penalised for the number of parameters to be estimated.
Likelihood ratio tests are known to favour the most detailed model. This is due to log L being a biased estimator of the `information'. Akaike (1973) showed that this bias is approximately proportional to the number of parameters estimated. The Akaike information criterion (AIC) (pronounced, approximately, ah-kah-ee-kay) corrects for this bias by penalising the likelihood accordingly, i.e. AIC = -2 log L + 2 p with p the number of parameters. The model considered to fit the data `best' is then the model with the lowest AIC value.
As emphasized above, REML estimation maximises the part of the likelihood which is does not depend on the fixed effects fitted. This implies that the REML maximum log likelihood cannot be used directly in making any inference about fixed effects.
See Tess et al. (1993) and Welham et al. (1997) for discussions on testing of fixed effects in conjunction with REML estimation.