Wald F statistics

Analysis procedures

Introduction

Inference for fixed effects in linear mixed models introduces some difficulties. In general, the methods used to construct F-tests in analysis of variance and regression cannot be used for the diversity of applications of the general linear mixed model available in ASReml. One approach would be to use likelihood ratio methods (see Welham and Thompson, 1997) although their approach is not easily implemented.

Wald-type test procedures are generally favoured for conducting tests concerning t. The traditional Wald statistic to test the hypothesis H0 Lt = l for given L, r x p, and l, r x 1, is given by
W = (Lt - l)' [L (X'H-1 X)-1 L' ]-1 (Lt - l)
and asymptotically, this statistic has a chi-square distribution on r degrees of freedom. These are marginal tests, so that there is an adjustment for all other terms in the fixed part of the model. It is also anti-conservative if p-values are constructed because it assumes the variance parameters are known.

The small sample behaviour of such statistics has been considered by Kenward and Roger (1997) in some detail. They presented a scaled Wald statistic, together with an F-approximation to its sampling distribution which they showed performed well in a range (though limited in terms of the range of variance models available in ASReml) of settings.

In the following we describe the facilities in ASReml for inference concerning terms which are the in dense fixed effects model component of the general linear mixed model. These facilities are not available for any terms in the sparse model. These include facilities for computing two types of Wald statistics and partial implementation of the Kenward and Roger adjustments.

Incremental and Conditional Wald Statistics

The basic tool for inference is the Wald statistic defined in equation 14.1. However, there are several ways L can be defined to construct a test for a particular model term, two of which are available in ASReml. ASReml obtains an F-statistic by dividing the Wald statistic by r, the numerator degrees of freedom. In this form it is possible to perform an approximate F test if we can deduce the denominator degrees of freedom. For balanced designs, these Wald F statistics are numerically identical to the F-tests obtained from the standard analysis of variance.

The first method for computing Wald statistics (for each term) is the so-called `incremental' form. For this method, Wald statistics are computed from an incremental sum of squares in the spirit of the approach used in classical regression analysis (see Searle, 1971). For example if we consider a very simple model with terms relating to the main effects of two qualitative factors A and B, given symbolically by
y ~ 1 + A + B
where the 1 represents the constant term (mu), then the incremental sums of squares for this model can be written as the sequence
 R( 1) R( A | 1) = R( 1,A) - R( 1) R( B | 1,A) = R( 1,A,B) - R( 1,A)
where the R(.) operator denotes the reduction in the total sums of squares due to a model containing its argument and R(.|.) denotes the difference between the reduction in the sums of squares for any pair of (nested) models. Thus R( B | 1,A) represents the difference between the reduction in sums of squares between the so-called maximal model
y ~ 1 + A + B      and      y ~ 1 + A
Implicit in these calculations is that
• we only compute Wald statistics for estimable functions (Searle, 1971, page 408),
• all variance parameters are held fixed at the current REML estimates from the maximal model

In this example, it is clear that the incremental Wald statistics may not produce the desired test for the main effect of A, as in many cases we would like to produce a Wald statistic for A based on
R( A | 1,B) = R( 1,A,B) - R( 1,B)

The issue is further complicated when we invoke `marginality' considerations. The issue of marginality between terms in a linear (mixed) model has been discussed in much detail by Nelder (1977). In this paper Nelder defines marginality for terms in a factorial linear model with qualitative factors, but later Nelder (1994) extended this concept to functional marginality for terms involving quantitative covariates and for mixed terms which involve an interaction between quantitative covariates and qualitative factors. Referring to our simple illustrative example above, with a full factorial linear model given symbolically by
y ~ 1 + A + B + A.B
then A and B are said to be marginal to A.B, and 1 is marginal to A and B. In a three way factorial model given by
y ~ 1 + A + B + C + A.B + A.C + B.C + A.B.C
the terms A, B, C, A.B, A.C and B.C are marginal to A.B.C. Nelder (1977, 1994) argues that meaningful and interesting tests for terms in such models can only be conducted for those tests which respect marginality relations. This philosophy underpins the following description of the second Wald statistic available in ASReml, the so-called ``conditional'' Wald statistic. This method is invoked by placing !FCON on the datafile line. ASReml attempts to construct conditional Wald statistics for each term in the fixed dense linear model so that marginality relations are respected. As a simple example, for the three way factorial model the conditional Wald statistics would be computed as
 Term Sums of Squares Mcode 1 R( 1) . A R( A | 1,B,C,B.C) = R( 1,A,B,C,B.C) - R( 1,B,C,B.C) A B R( B | 1,A,C,A.C) = R( 1,A,B,C,A.C) - R( 1,A,C,A.C) A C R( C | 1,A,B,A.B) = R( 1,A,B,C,A.B) - R( 1,A,B,A.B) A A.B R( A.B | 1,A,B,C,A.C,B.C) = R( 1,A,B,C,A.B,A.C,B.C) -      R( 1,A,B,C,A.C,B.C) B A.C R( A.C | 1,A,B,C,A.B,B.C) = R( 1,A,B,C,A.B,A.C,B.C)-      R( 1,A,B,C,A.B,B.C) B B.C R( B.C | 1,A,B,C,A.B,A.C) = R( 1,A,B,C,A.B,A.C,B.C)-      R( 1,A,B,C,A.B,A.C) B A.B.C R( A.B.C | 1,A,B,C,A.B,A.C,B.C) = R( 1,A,B,C,A.B,A.C,B.C,A.B.C) -      R( 1,A,B,C,A.B,A.C,B.C) C
Of these the conditional Wald statistic for the 1, B.C and A.B.C terms would be the same as the incremental Wald statistics produced using the linear model
y ~ 1 + A + B + C + A.B + A.C + B.C + A.B.C
The preceeding table includes a so-called M (marginality) code reported by ASReml when conditional Wald statistics are presented. All terms with the highest M code letter are tested conditionally on all other terms in the model, i.e. by dropping the term from the maximum model. All terms with the preceding M code letter, are marginal to at least one term in a higher group, and so forth. For example, in the table, model term A.B has M code B because it is marginal to model term A.B.C and model term A has M code A because it is marginal to A.B, A.C and A.B.C. Model term mu ( M code .) is a special case in that it is marginal to factors in the model but not to covariates.

Consider now a nested model which might be represented symbolically by
y ~ 1 + REGION + REGION.SITE
For this model, the incremental and conditional Wald tests will be the same. However, it is not uncommon for this model to be presented to ASReml as
y ~ 1 + REGION + SITE
with SITE identified across REGION rather than within REGION. Then the nested structure is hidden but ASReml will still detect the structure and produce a valid conditional Wald F-statistic. This situation will be flagged in the M code field by changing the letter to lower case. Thus, in the nested model, the three M codes would be ., A and B because REGION.SITE is obviously an interaction dependent on REGION. In the second model, REGION and SITE appear to be independent factors so the initial M codes are ., A and A. However they are not independent because REGION removes additional degrees of freedom from SITE, so the M codes are changed from ., A and A to ., a and A.

We strongly recommend, if you are in any doubt about the conditional maximal model for the conditional Wald F-statistic, that you consult the .aov file which spells out the actual model for each each term. We also advise users that the aim of the conditional Wald statistic is to facilitate inference for fixed effects. It is not meant to be prescriptive nor is it foolproof for every setting.

The Wald statistics are collectively presented in a summary table in the .asr file. The basic table includes the numerator degrees of freedom (denoted nu1) and the incremental Wald F-statistic for each term. To this is added the conditional Wald F-statistic and the M code if !FCON is specified. A conditional F-statistic is not reported for mu in the .asr but is in the .aov file (adjusted for covariates).

In moderately sized analyses, ASReml will also include the denominator degrees of freedom ( DenDF, denoted by nu2, Kenward and Roger, 1997) and a probablity value if these can be computed. They will be for the conditional Wald F-statistic if it is reported. The !DDF i qualifier can be used to suppress the DenDF calculation ( !DDF -1) or request a particular algorithmic method: DDF 1 for numerical derivatives, !DDF 2 for algebraic derivatives. The value in the probability column (either Pad or Pcn) is computed from an F{nu1,nu2) reference distribution. When the DenDF is not available, it is possible, though anti-conservative to use the residual degrees of freedom for the denominator.

Kenward and Roger (1997) pursued the concept of construction of Wald-type test statistics through an adjusted variance matrix of t. They argued that it is useful to consider an improved estimator of the variance matrix of t which has less bias and accounts for the variability in estimation of the variance parameters. There are two reasons for this. Firstly, the small sample distribution of Wald tests is simplified when the adjusted variance matrix is used. Secondly, if measures of precision are required for t or effects therein, those obtained from the adjusted variance matrix will generally be preferred. Unfortunately the Wald statistics are currently computed using an unadjusted variance matrix.

Approximate stratum variances

ASReml reports approximate stratum variances and degrees of freedom for simple variance components models. For the linear mixed-effects model with variance components (setting sigma2H= 1) where G is direct sum gammaj/var> Ib, it is often possible to consider a natural ordering of the variance component parameters including sigma2. Based on an idea due to Thompson (1980), ASReml computes approximate stratum degrees of freedom and stratum variances by a modified Cholesky diagonalisation of the expected (or average) information matrix. That is, if F is the average information matrix for sigma, let U be an upper triangular matrix such that F =U'U. Further we define
Uc= Dcb>U
where Dcis a diagonal matrix whose elements are given by the inverse elements of the last column of U ie dcii = 1/uir, i=1:r. The matrix Ucis therefore upper triangular with the elements in the last column equal to one. If the vector sigma is ordered in the natural way, with sigma2 being the last element, then we can define the vector of so called pseudo stratum variance components by
xi = Ucsigma
Thence
xi = Dcsup>2
The diagonal elements can be manipulated to produce effective stratum degrees of freedom Thompson (1980) viz
nui= 2xiisup>2/dcii2
In this way the closeness to an orthogonal block structure can be assessed.

Example

```          Approximate stratum variance decomposition
Stratum     Degrees-Freedom   Variance      Component Coefficients
blocks                 5.00    3175.06        12.0     4.0     1.0
blocks.wplots         10.00    601.331         0.0     4.0     1.0
Residual Variance     45.00    177.083         0.0     0.0     1.0

Source          Model  terms     Gamma     Component    Comp/SE   % C
blocks              6      6   1.21116       214.477       1.27   0 P
blocks.wplots      18     18  0.598937       106.062       1.56   0 P
Variance           72     60   1.00000       177.083       4.74   0 P

Wald F statistics
Source of Variation     NumDF     DenDF    F-inc             Prob
7 mu                          1       5.0   245.14            <.001
4 variety                     2      10.0     1.49            0.272
2 nitrogen                    3      45.0    37.69            <.001
8 variety.nitrogen            6      45.0     0.30            0.932
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using algebraic derivatives.

```