Generalized Linear Mixed Models

Introduction

ASReml uses an approximate likelihood technique called penalized quasi-likelihood (PQL) to analyse data sampled from one of the common members of the exponential family. In this section we present a few examples to demonstrate the coding in ASReml.

Binomial analysis of Footrot score

Mohommad Alwan (pers comm) for his Master thesis at Massey University scored the feet of 2513 lambs born in 1980 and 1981. The lambs were from 5 mating groups: 7 Perendale rams over Perendale ewes in 1980, 6 Booroola by Romney rams over Perendale ewes in 1980, 3 Booroola rams over Romney ewes in 1980, 6 Perendale rams over Perendale ewes in 1981, and 12 Booroola by Romney rams (from froup 3) over Perendale ewes in 1981. This data was analysed by Gilmour (1984) and Gilmour et al. (1987).

The data file LAMB.DAT contains grouped data for the 68 combinations of Sex and Sire for two footshape classes: FS1, all four feet are normal, FS2, one foot is deformed; and two indicator variables for the presence of disease conditions Scald and Rot. No scald or rot was present in group 4 lambs and these responses have been set to missing. The genetic relationships among sires are ignored in this analysis although it would just require a sire relationship matrix to include them.

Our first analysis is of the incidence of foot rot on the Normal scale as a weighted analysis to mimic analysis of the ungrouped data. Using 56 of the 68 records (ignoring Group 4), there are 1960 (=56 cross 35.00) observations and so we use the !DF 1904 (=1960-56) qualifier to get the correct residual degrees of freedom for this analysis of the proportion with footrot. The !YSS 62.54249 qualifier adds 62.54249=67-4.45751 to the Total Sum of Squares so that it includes the extra variation associated with the extra degrees of freedom. There were 67 (=56*1.196) cases of foot rot so the Total uncorrected Sum of Squares for a binary variable should be 67. However the weighted sum of squares for the pRot values is only 4.45751 (for example the first record contributes 1/39=(1/39)2 cross 39 instead of 1.0. 4.45751 was discovered from the .asl file on the line 4.45751 SSPD before inserting the !YSS qualifier. The transformations in the code which follows convert Scald and Rot to 'missing' for group 4.
 Lamb data from ARG thesis page 177-8
  Year   GRP  5   !V99=V2 !==4 !M1
  SEX  SIRE !I
  Total
  FS1  FS2  Scald !+V99 Rot  !+V99
  pRot !=Rot !/Total
 # 1  1  1  101  39 33  6  6  1
 LAMB.DAT !skip 1
  !DF 1904  !YSS  62.54249
 pRot  !TOTAL=Total  ~ mu SEX GRP !r SIRE
 predict SEX 0 1 GRP 1 2 3 5

The pertinant results are
 Univariate analysis of pRot
 Summary of 56 records retained of 68 read

  Model term      Size #miss #zero   MinNon0    Mean      MaxNon0  StndDevn
   1 Year                  0     0  1.000      1.536      2.000     0.5032
   2 GRP             5     0     0      1     3.1429          5
   3 SEX                   0    28  1.000     0.5000      1.000     0.5045
   4 SIRE           34     0     0      1    17.0714         34
   5 Total    Weight       0     0  16.00      35.00      64.00      12.89
   6 FS1                   0     0  6.000      23.46      50.00      10.76
   7 FS2                   0     0  3.000      10.14      30.00      5.661
   8 Scald                 0    13  1.000      3.071      16.00      3.458
   9 Rot                   0    19  1.000      1.196      4.000      1.151
  10 pRot     Variate      0    19 0.1754E-01 0.3606E-01 0.1818     0.3833E-01
  11 mu                          1
   12 SEX.GRP                     5  3 SEX :   1   2 GRP :    5
 Forming      46 equations:  12 dense.
 Initial updates will be shrunk by factor    0.224
 Notice: Algebraic Denominator DF calculation is not available
         Numerical derivatives will be used.
 Notice:      4 singularities detected in design matrix.
   1 LogL= 2423.41     S2= 0.32397E-01   1952 df    :   1 components restrained
   2 LogL= 2431.71     S2= 0.32792E-01   1952 df   0.6325E-02  1.000
   3 LogL= 2431.80     S2= 0.32737E-01   1952 df   0.9265E-02  1.000
   4 LogL= 2431.80     S2= 0.32738E-01   1952 df   0.9200E-02  1.000
 Final parameter values                       0.92543E-02 1.0000

          - - - Results from analysis of pRot - - -

          Approximate stratum variance decomposition
 Stratum     Degrees-Freedom   Variance      Component Coefficients
 SIRE                  25.70   0.506971E-01    59.7     1.0
 Residual Variance     15.83   0.327367E-01     0.0     1.0

 Source                Model  terms     Gamma     Component    Comp/SE   % C
 SIRE                     34     34  0.918415E-02  0.300659E-03   0.98 -22 P
 Variance                 56   1952   1.00000      0.327367E-01   2.81   0 P

                                   Wald F statistics
     Source of Variation           NumDF     DenDF    F-inc             Prob
  11 mu                                1      19.9    42.79            <.001
   3 SEX                               1      16.2     0.02            0.882
   2 GRP                               3      21.9     2.04            0.139
  12 SEX.GRP                           3      16.1     0.39            0.763
 Notice: The DenDF values are calculated ignoring fixed/boundary/singular
             variance parameters using numerical derivatives.
   4 SIRE                                 34 effects fitted (       6 are zero)
Two things stand out in this analysis. From a genetic perspective, the heritability estimate is 0.0364=4 *.0003007/(.0003007+.0327367) This can be calculated in ASReml with the .pin file commands
 F GenVar 1*4
 F TotVar 1 2
 H heritability 3 4
Secondly, there is little evidence of significant difference between classes. The predicted values are
Sex PxP 1980 BRxP 1980 BxR 1980 BRxP 1981
0 0.0183 +/- 0.0130 0.0432 +/- 0.0126 0.0758 +/- 0.0268 0.0305 +/- 0.0111
1 0.0152 +/- 0.0132 0.0375 +/- 0.0124 0.0603 +/- 0.0244 0.0425 +/- 0.0108
An analysis of footrot as a binomial variable using the logistic link is performed by the model line (and dropping the !DF qualifier).
Rot !bin !TOTAL=Total ~ mu SEX GRP SEX.GRP !r SIRE .16783 pertinant results are
 Distribution and link: Binomial; Logit  Mu=P=1/(1+exp(-XB))
                                          V=Mu(1-Mu)/N
 Warning: The LogL value is unsuitable for comparing GLM models

 Notice:      4 singularities detected in design matrix.
   1 LogL=-28.1544     S2=  1.0000         48 df    Dev/DF=  0.9060
   2 LogL=-28.7417     S2=  1.0000         48 df    Dev/DF=  0.8897
   3 LogL=-28.7186     S2=  1.0000         48 df    Dev/DF=  0.8805
   4 LogL=-28.6705     S2=  1.0000         48 df    Dev/DF=  0.8551
   5 LogL=-28.6494     S2=  1.0000         48 df    Dev/DF=  0.8238
   6 LogL=-28.6687     S2=  1.0000         48 df    Dev/DF=  0.7959
   7 LogL=-28.6774     S2=  1.0000         48 df    Dev/DF=  0.7915
   8 LogL=-28.6784     S2=  1.0000         48 df    Dev/DF=  0.7909
   9 LogL=-28.6785     S2=  1.0000         48 df    Dev/DF=  0.7908
 Final parameter values                       0.26321     1.0000
 Deviance from GLM fit              48       37.96
 Variance heterogeneity factor [Deviance/DF]     0.79

          - - - Results from analysis of Rot - - -
 Notice: While convergence of the LogL value indicates that the model
         has stabilized, its value CANNOT be used to formally test differences
         between Generalized Linear (Mixed) Models.

          Approximate stratum variance decomposition
 Stratum     Degrees-Freedom   Variance      Component Coefficients
 SIRE                   3.10   0.263207         1.0

 Source                Model  terms     Gamma     Component    Comp/SE   % C
 SIRE                     34     34  0.263207      0.263207       1.25   0 P
 Variance                 56     48   1.00000       1.00000       0.00   0 F

                                   Wald F statistics
     Source of Variation           NumDF     DenDF    F-inc             Prob
  11 mu                                1      20.2   418.38            <.001
   3 SEX                               1      48.0     0.02            0.881
   2 GRP                               3      21.5     1.99            0.146
  12 SEX.GRP                           3      NA       0.36               NA
The effects in this analysis are on a logistic scale with a variance of 3.28987=π2/3 and so the heritability on the underlying (logistic) scale is 0.296=4 * 0.2632/( 3.28987+0.26321). This can be calculated in ASReml with the .pin file commands
 F GenVar 1*4
 F TotVar 1 4*3.28987
 H heritability 3 4
Repeating the analysis on the Probit scale by inserting !PROBIT after !BIN in the model line produces a Sire component of 0.0514 on the Probit scale which has an underlying variance of 1.0. The heritabily estimate is then 0.196. Given the incidence (0.034), the heritability on the probit scale is expected to be around 0.215=0.0364/(z2/pq) where z=0.0758 is the ordinate of a Normal(0,1) corresponding to p=1-q=0.034.

The preceding Wald F Statistics pertain to the working variable created as part of the PQL analysis. The SEX.GRP interaction is clearly not significant even though ASReml was not able to calculate a plausible value for the Denominator DF for this summarized data. The predicted means shown below are not that different from those obtained from analysis on the 0,1 scale but the standard errors are very different. These predicted means have been backtransformed by \ASReml\ from the underlying (logistic) scale to the probablity scale. The initial analysis (on the 0,1 probability scale) ignores the variance differences associated with binomial data.
PxP 1980 BRxP 1980 BxR 1980 BRxP 1981
0 0.0180 +/- 0.0070 0.0430 +/- 0.0124 0.0748 +/- 0.0323 0.0281 +/- 0.0083
1 0.0151 +/- 0.0063 0.0373 +/- 0.0110 0.0592 +/- 0.0257 0.0401 +/- 0.0103
ASReml has an 'Analysis of Deviance' option which we now demonstrate. In a mixed model, the variance components will change depending on which fixed terms are in the model. This will invalidate the Analysis of Deviance unless the variance components are fixed at the full model solution. So, fitting the model line
Rot !bin !TOT=Total !AODEV ~ mu SEX GRP SEX.GRP !r SIRE .2632 !GF} produces the Analysis of Deviance
                    Analysis of Deviance Table for Rot
  Source of Variation               df    Deviance   Derived F
  SEX                                1        0.02       0.021
  GRP                                3        4.35       1.833
  SEX.GRP                            3        1.16       0.487
 Deviance from GLM fit              48       37.96
 Variance heterogeneity factor [Deviance/DF]     0.79
The Deviance is the deviance calculated from the binomial part of the log-likelihood. This is distinct from the log-likelihood obtained by the REML algorithm which pertains to the working variable. Since the working variable changes with the model fitted, the LogL values are not comparable between models. The heterogeneity factor is the Deviance / df and gives some indication as to how well the discrete distribution fits the data. A value greater than 1 suggests the data is over-dispersed, that is the data values are more variable than expected under the chosen distribution. There is also a !DISPERSION [d] qualifier. If d is supplied, it serves as a scaling factor for the weights in the analysis, changing the reported variances and standard deviations. If d is not supplied, it is estimated from the residual as the model is fitted to the working variable. ASReml solves for the linear effects twice (see the !GLMM qualifier) iteration of the variance components so that the variance component updates are based on solutions obtained using the same variance parameters. I.e. We start with a set of solutions and some parameters. We use these to update the solutions. Then use the updated solutions to update the variance parameter.

Bivariate analysis of binomial Foot score and Normal weight

The data file BINNOR.txt contains the expanded version (2513 records) of the lamb data from the previous example augmented with an extra simulated variable YVar. It was created from the summarized data without knowing which actual individuals had which combinations of trait values. The binary variable Score1 indicates whether all four feet are sound. The following code produces a bivariate analysis of Score1 on the underlying logistic scale and YVar on the Normal scale.
 Lamb data from ARG thesis page 177-8
  Year   GRP  5      !V99=V2 !==4 !M 1
  SEX  SIRE !I
  Score1
  Score2  Scald !+V99 Rot  !+V99
  YVar
 binnor.txt !skip 1   !ASUV  !MAXIT 40

 Score1 YVar !bin ~ Trait.SEX Trait.GRP !r Trait.SIRE
 1 2 1
 2513
 2 0 US  !GFPP
 1 .01 0.25

 Trait.SIRE 2
 Trait 0 US 0.015 0.01 1.05
 SIRE
There are several issues addressed in this code.
  • !ASUV is required, and if there had been any missing values in the data, the fixed model term mv would also be required.
  • ASReml constructs the R matrix by scaling the reported matrix by the binomial variance calculated from the fitted value of the binomial variate. Consequently, to avoid over/under dispersion being also fitted, the residual 'variance' for the binomial trait is fixed at 1.0 by giving its initial value as 1.0 and using the qualifier !GFPP.
  • The response variables must be listed before the qualifiers. If written as \as{Score !BIN YVar}, YVar would be parsed as an argument to !BIN rather than as a response variable.
  • Only one categorical response is permitted, and it must be specified first.

    output follows.
     Distribution and link: Binomial; Logit  Mu=P=1/(1+exp(-XB))
                                              V=Mu(1-Mu)/N
     Warning: The LogL value is unsuitable for comparing GLM models
       1 LogL=-894.974     S2=  1.0000       5014 df    Dev/DF=  0.6196
       2 LogL=-894.554     S2=  1.0000       5014 df    Dev/DF=  0.6194
       3 LogL=-890.600     S2=  1.0000       5014 df    Dev/DF=  0.6178
       4 LogL=-884.431     S2=  1.0000       5014 df    Dev/DF=  0.6144
       5 LogL=-885.759     S2=  1.0000       5014 df    Dev/DF=  0.6109
       6 LogL=-892.413     S2=  1.0000       5014 df    Dev/DF=  0.6085
       7 LogL=-896.969     S2=  1.0000       5014 df    Dev/DF=  0.6077
       8 LogL=-897.941     S2=  1.0000       5014 df    Dev/DF=  0.6076
       9 LogL=-897.962     S2=  1.0000       5014 df    :   1 components restrained
      10 LogL=-897.962     S2=  1.0000       5014 df    Dev/DF=  0.6076
      11 LogL=-897.961     S2=  1.0000       5014 df    Dev/DF=  0.6076
     Deviance from GLM fit            5014     3046.50
     Variance heterogeneity factor [Deviance/DF]     0.61
    
              - - - Results from analysis of Score1 YVar - - -
     Notice: While convergence of the LogL value indicates that the model
             has stabilized, its value CANNOT be used to formally test differences
             between Generalized Linear (Mixed) Models.
    
     Source                Model  terms     Gamma     Component    Comp/SE   % C
     Residual        UnStructured  2  1 -0.162615E-03 -0.162615E-03  -0.03   0 P
     Residual        UnStructured  2  2  0.255609      0.255609      35.20   0 P
     Trait.SIRE        UnStructured 1 1  0.166092      0.166092       2.73   0 U
     Trait.SIRE        UnStructured 2 1  0.330313E-02  0.330313E-02   0.07   0 U
     Trait.SIRE        UnStructured 2 2  0.303900      0.303900       3.76   0 U
     Covariance/Variance/Correlation Matrix UnStructured Residual
       1.000     -0.3216E-03
     -0.1626E-03  0.2556
     Covariance/Variance/Correlation Matrix UnStructured Trait.SIRE
      0.1661      0.1470E-01
      0.3303E-02  0.3039
    
                                       Wald F statistics
         Source of Variation           NumDF    DenDF-con F-inc    F-con M P-con
      11 Trait.SEX                         2      NA     393.15    76.10 A    NA
      12 Trait.GRP                        10      40.9  1993.52  1993.52 A <.001
     Notice: The DenDF values are calculated ignoring fixed/boundary/singular
                 variance parameters using numerical derivatives.
    
    YVar data was artificially created and the SIRE variance is too large represent purely genetic variance. Ordinal analysis of Footrot score By way of introduction to ordinal analysis in ASReml consider the cheese data from page 175 of McCullagh and Nelder (1994). Four cheeses were scored on a ninepoint scale by 52 tasters giving

    Table 1 Response frequencies in a cheese tasting experiment
    I II III IV V VI VII VIII IX Total
    A 0 0 1 7 8 8 19 8 1 52
    B 6 9 12 11 7 6 1 0 0 52
    C 1 1 6 8 23 7 5 1 0 52
    D 0 0 0 1 3 7 14 16 11 52

    There are several ways of supplying the data for multinomial analysis. In this case, totals in the 9 classes are supplied in a single grouped response. It is analysed using a multiple (8) threshold model as in McCullagh and Nelder (1994) with the \ASReml\ code
     McCullagh and Nelder Cheese example p 175
      Cheese !A
      Rating !G 9 Total
     Cheese.txt
    
     Rating !MULT 9 !CUM ~ Trait Cheese
     PREDICT Cheese
    
    where Cheese.txt contains the data laid out as in Table 1. The model term Trait fits the thresholds and interpreting the model as a threshold model implies it should not be interacted with other terms. Nevertheless, sometimes an interaction is fitted. Note that \ASReml\ does not have a procedure for multinomial data which is not ordered (except as fitted with a log linear model), nd fitting a bivariate analysis involving a multinomial trait is not possible. The output is
     Univariate analysis of Rating
     Summary of 4 records retained of 4 read
    
      Model term          Size #miss #zero   MinNon0    Mean      MaxNon0  StndDevn
       1 Cheese              4     0     0      1     2.5000          4
       2 Rating       Variate      0     2  1.000      1.750      6.000      2.872
       2 Rating       Variate      0     2  1.000      2.500      9.000      4.359
       2 Rating       Variate      0     1  1.000      4.750      12.00      5.500
       2 Rating       Variate      0     0  1.000      6.750      11.00      4.193
       2 Rating       Variate      0     0  3.000      10.25      23.00      8.770
       2 Rating       Variate      0     0  6.000      7.000      8.000     0.8165
       2 Rating       Variate      0     0  1.000      9.750      19.00      8.221
       2 Rating       Variate      0     1  1.000      6.250      16.00      7.411
       2 Rating       Variate      0     2  1.000      3.000      11.00      5.354
       3 Total                     0     0  52.00      52.00      52.00      0.000
       4 Trait               8
     Forming      12 equations:  12 dense.
     Initial updates will be shrunk by factor    0.010
     Distribution and link: Cum. Multinomial; Logit  P=1/(1+exp(-XB))
     Warning: The LogL value is unsuitable for comparing GLM models
     Notice:      1 singularities detected in design matrix.
       1 LogL=-26.4243     S2=  1.0000         21 df    Dev/DF=  0.3356
       2 LogL=-26.4503     S2=  1.0000         21 df    Dev/DF=  0.3376
       3 LogL=-26.4506     S2=  1.0000         21 df    Dev/DF=  0.3376
       4 LogL=-26.4506     S2=  1.0000         21 df    Dev/DF=  0.3376
       5 LogL=-26.4506     S2=  1.0000         21 df    Dev/DF=  0.3376
     Deviance from GLM fit              21       20.31
     Variance heterogeneity factor [Deviance/DF]     0.97
    
              - - - Results from analysis of Rating - - -
     Notice: While convergence of the LogL value indicates that the model
             has stabilized, its value CANNOT be used to formally test differences
             between Generalized Linear (Mixed) Models.
    
     Source                Model  terms     Gamma     Component    Comp/SE   % C
    
                                       Wald F statistics
         Source of Variation           NumDF              F-inc
       4 Trait                             8              17.45
       1 Cheese                            3              38.38
    
     Warning: These Wald F statistics are based on the working variable and are
              not equivalent to an Analysis of Deviance. Standard errors are scaled
              by the variance of the working variable, not the residual deviance.
     Finished: 17 Jun 2008 13:19:51.484   LogL Converged
    
    Reverting to the collapsed lamb data, the two response variables FS1 and FS2 contain counts of the lambs with all feet sound, and with one foot deformed, respectively. The count for those with two or more deformed is given by difference from Total. A threshold model analysis of this data is given by the model line
    FS1 FS2 !mult 3 !TOTAL=Total ~ Trait SEX GRP !r SIRE output
     Notice:      1 singularities detected in design matrix.
       1 LogL=-105.631     S2=  1.0000        129 df    Dev/DF=   1.082
       2 LogL=-105.632     S2=  1.0000        129 df    Dev/DF=   1.082
       3 LogL=-105.631     S2=  1.0000        129 df    Dev/DF=   1.081
       4 LogL=-105.628     S2=  1.0000        129 df    Dev/DF=   1.080
       5 LogL=-105.627     S2=  1.0000        129 df    Dev/DF=   1.079
       6 LogL=-105.627     S2=  1.0000        129 df    Dev/DF=   1.078
     Deviance from GLM fit             129      139.09
     Variance heterogeneity factor [Deviance/DF]     1.08
    
              - - - Results from analysis of FS1 FS2 - - -
     Notice: While convergence of the LogL value indicates that the model
             has stabilized, its value CANNOT be used to formally test differences
             between Generalized Linear (Mixed) Models.
    
     Source                Model  terms     Gamma     Component    Comp/SE   % C
     SIRE                     34     34  0.174697      0.174697       2.80   0 P
    
                                       Wald F statistics
         Source of Variation           NumDF     DenDF    F-inc             Prob
      11 Trait                             2      77.8   405.40            <.001
       3 SEX                               1     129.0     5.61            0.020
       2 GRP                               4      30.0     8.03            <.001
     Notice: The DenDF values are calculated ignoring fixed/boundary/singular
                 variance parameters using numerical derivatives.
    
     Warning: These Wald F statistics are based on the working variable and are
              not equivalent to an Analysis of Deviance. Standard errors are scaled
              by the variance of the working variable, not the residual deviance.
    
                         Solution       Standard Error    T-value     T-prev
       2 GRP
                        2  -0.727155       0.273336         -2.66
                        3   -1.76491       0.356573         -4.95     -2.93
                        4   -1.19399       0.273168         -4.37      1.61
                        5  -0.915605       0.242677         -3.77      1.16
       3 SEX
                        1  -0.197719       0.856093E-01     -2.31
      11 Trait
                        1    1.54993       0.200125          7.74
                        2    3.82051       0.216314         17.66     27.12
       4 SIRE                                 34 effects fitted
     Finished: 18 Jun 2008 12:35:09.062   LogL Converged
    

    Return to start