Generalized Linear Mixed Models
Introduction
ASReml uses an approximate likelihood technique called
penalized quasi-likelihood (PQL)
to analyse
data sampled from one of the common members of the exponential family.
In this section we present a few examples to demonstrate the coding in ASReml.
Binomial analysis of Footrot score
Mohommad Alwan (pers comm) for his Master thesis at Massey University
scored the feet of 2513 lambs born in 1980 and 1981.
The lambs were from 5 mating groups: 7 Perendale rams over Perendale ewes in 1980, 6 Booroola by Romney rams over Perendale ewes in 1980,
3 Booroola rams over Romney ewes in 1980,
6 Perendale rams over Perendale ewes in 1981, and
12 Booroola by Romney rams (from froup 3) over Perendale ewes in 1981.
This data was analysed by Gilmour (1984) and Gilmour et al. (1987).
The data file LAMB.DAT contains grouped data for the 68 combinations
of Sex and Sire for two footshape classes: FS1, all four feet are normal,
FS2, one foot is deformed; and two indicator variables for the presence
of disease conditions Scald and Rot. No scald or rot was present in group 4 lambs and these responses have been set to missing.
The genetic relationships among sires are ignored in this analysis
although it would just require a sire relationship matrix to include them.
Our first analysis is of the incidence of foot rot on the Normal scale
as a weighted analysis to mimic analysis of the ungrouped data.
Using 56 of the 68 records (ignoring Group 4), there
are 1960 (=56 cross 35.00) observations and so we use the !DF 1904 (=1960-56) qualifier
to get the correct residual degrees of freedom for this analysis of the
proportion with footrot. The !YSS 62.54249 qualifier adds 62.54249=67-4.45751
to the Total Sum of Squares so that it includes the extra variation associated
with the extra degrees of freedom. There were 67 (=56*1.196) cases of foot
rot so the Total uncorrected Sum of Squares for a binary variable should be 67.
However the weighted sum of squares for the pRot values is only
4.45751 (for example the first record contributes 1/39=(1/39)2 cross 39 instead of 1.0. 4.45751
was discovered from the .asl file on the line
4.45751 SSPD before inserting the !YSS qualifier.
The transformations in the code which follows convert
Scald and Rot to 'missing' for group 4.
Lamb data from ARG thesis page 177-8
Year GRP 5 !V99=V2 !==4 !M1
SEX SIRE !I
Total
FS1 FS2 Scald !+V99 Rot !+V99
pRot !=Rot !/Total
# 1 1 1 101 39 33 6 6 1
LAMB.DAT !skip 1
!DF 1904 !YSS 62.54249
pRot !TOTAL=Total ~ mu SEX GRP !r SIRE
predict SEX 0 1 GRP 1 2 3 5
The pertinant results are
Univariate analysis of pRot
Summary of 56 records retained of 68 read
Model term Size #miss #zero MinNon0 Mean MaxNon0 StndDevn
1 Year 0 0 1.000 1.536 2.000 0.5032
2 GRP 5 0 0 1 3.1429 5
3 SEX 0 28 1.000 0.5000 1.000 0.5045
4 SIRE 34 0 0 1 17.0714 34
5 Total Weight 0 0 16.00 35.00 64.00 12.89
6 FS1 0 0 6.000 23.46 50.00 10.76
7 FS2 0 0 3.000 10.14 30.00 5.661
8 Scald 0 13 1.000 3.071 16.00 3.458
9 Rot 0 19 1.000 1.196 4.000 1.151
10 pRot Variate 0 19 0.1754E-01 0.3606E-01 0.1818 0.3833E-01
11 mu 1
12 SEX.GRP 5 3 SEX : 1 2 GRP : 5
Forming 46 equations: 12 dense.
Initial updates will be shrunk by factor 0.224
Notice: Algebraic Denominator DF calculation is not available
Numerical derivatives will be used.
Notice: 4 singularities detected in design matrix.
1 LogL= 2423.41 S2= 0.32397E-01 1952 df : 1 components restrained
2 LogL= 2431.71 S2= 0.32792E-01 1952 df 0.6325E-02 1.000
3 LogL= 2431.80 S2= 0.32737E-01 1952 df 0.9265E-02 1.000
4 LogL= 2431.80 S2= 0.32738E-01 1952 df 0.9200E-02 1.000
Final parameter values 0.92543E-02 1.0000
- - - Results from analysis of pRot - - -
Approximate stratum variance decomposition
Stratum Degrees-Freedom Variance Component Coefficients
SIRE 25.70 0.506971E-01 59.7 1.0
Residual Variance 15.83 0.327367E-01 0.0 1.0
Source Model terms Gamma Component Comp/SE % C
SIRE 34 34 0.918415E-02 0.300659E-03 0.98 -22 P
Variance 56 1952 1.00000 0.327367E-01 2.81 0 P
Wald F statistics
Source of Variation NumDF DenDF F-inc Prob
11 mu 1 19.9 42.79 <.001
3 SEX 1 16.2 0.02 0.882
2 GRP 3 21.9 2.04 0.139
12 SEX.GRP 3 16.1 0.39 0.763
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using numerical derivatives.
4 SIRE 34 effects fitted ( 6 are zero)
Two things stand out in this analysis. From a genetic perspective, the
heritability estimate is 0.0364=4 *.0003007/(.0003007+.0327367)
This can be calculated
in ASReml with the .pin file commands
F GenVar 1*4
F TotVar 1 2
H heritability 3 4
Secondly, there is little evidence of significant difference between classes.
The predicted values are
Sex | PxP 1980 | BRxP 1980 | BxR 1980 | BRxP 1981
|
0 | 0.0183 +/- 0.0130 |
0.0432 +/- 0.0126 |
0.0758 +/- 0.0268 |
0.0305 +/- 0.0111
|
1 | 0.0152 +/- 0.0132 |
0.0375 +/- 0.0124 |
0.0603 +/- 0.0244 |
0.0425 +/- 0.0108
|
An analysis of footrot as a binomial variable using the logistic link
is performed by the model line (and dropping the !DF qualifier).
Rot !bin !TOTAL=Total ~ mu SEX GRP SEX.GRP !r SIRE .16783
pertinant results are
Distribution and link: Binomial; Logit Mu=P=1/(1+exp(-XB))
V=Mu(1-Mu)/N
Warning: The LogL value is unsuitable for comparing GLM models
Notice: 4 singularities detected in design matrix.
1 LogL=-28.1544 S2= 1.0000 48 df Dev/DF= 0.9060
2 LogL=-28.7417 S2= 1.0000 48 df Dev/DF= 0.8897
3 LogL=-28.7186 S2= 1.0000 48 df Dev/DF= 0.8805
4 LogL=-28.6705 S2= 1.0000 48 df Dev/DF= 0.8551
5 LogL=-28.6494 S2= 1.0000 48 df Dev/DF= 0.8238
6 LogL=-28.6687 S2= 1.0000 48 df Dev/DF= 0.7959
7 LogL=-28.6774 S2= 1.0000 48 df Dev/DF= 0.7915
8 LogL=-28.6784 S2= 1.0000 48 df Dev/DF= 0.7909
9 LogL=-28.6785 S2= 1.0000 48 df Dev/DF= 0.7908
Final parameter values 0.26321 1.0000
Deviance from GLM fit 48 37.96
Variance heterogeneity factor [Deviance/DF] 0.79
- - - Results from analysis of Rot - - -
Notice: While convergence of the LogL value indicates that the model
has stabilized, its value CANNOT be used to formally test differences
between Generalized Linear (Mixed) Models.
Approximate stratum variance decomposition
Stratum Degrees-Freedom Variance Component Coefficients
SIRE 3.10 0.263207 1.0
Source Model terms Gamma Component Comp/SE % C
SIRE 34 34 0.263207 0.263207 1.25 0 P
Variance 56 48 1.00000 1.00000 0.00 0 F
Wald F statistics
Source of Variation NumDF DenDF F-inc Prob
11 mu 1 20.2 418.38 <.001
3 SEX 1 48.0 0.02 0.881
2 GRP 3 21.5 1.99 0.146
12 SEX.GRP 3 NA 0.36 NA
The effects in this analysis are on a logistic
scale with a variance of
3.28987=π2/3 and so the heritability on the underlying (logistic) scale
is 0.296=4 * 0.2632/( 3.28987+0.26321). This can be calculated
in ASReml with the .pin file commands
F GenVar 1*4
F TotVar 1 4*3.28987
H heritability 3 4
Repeating the analysis on the Probit scale by inserting !PROBIT after
!BIN in the model line produces a Sire component of 0.0514 on the
Probit scale which has an underlying variance of 1.0. The heritabily estimate is then 0.196.
Given the incidence (0.034), the heritability on the probit scale
is expected to be around 0.215=0.0364/(z2/pq) where z=0.0758 is the ordinate of a Normal(0,1) corresponding to p=1-q=0.034.
The preceding Wald F Statistics pertain to the working variable
created as part of the PQL analysis.
The SEX.GRP interaction is clearly not significant
even though ASReml was not able to calculate a plausible value
for the Denominator DF for this summarized data. The predicted
means shown below are not that different from those
obtained from analysis on the 0,1 scale but the standard errors are very different.
These predicted means have been backtransformed by \ASReml\ from the underlying (logistic)
scale to the probablity scale.
The initial analysis (on the 0,1 probability scale) ignores the variance differences
associated with binomial data.
PxP 1980 | BRxP 1980 | BxR 1980 | BRxP 1981
|
0 | 0.0180 +/- 0.0070 |
0.0430 +/- 0.0124 |
0.0748 +/- 0.0323 |
0.0281 +/- 0.0083
|
1 | 0.0151 +/- 0.0063 |
0.0373 +/- 0.0110 |
0.0592 +/- 0.0257 |
0.0401 +/- 0.0103 | |
ASReml has an
'Analysis of Deviance' option which we now demonstrate.
In a mixed model, the variance components will change
depending on which fixed terms are in the model. This
will invalidate the Analysis of Deviance unless the variance components
are fixed at the full model solution.
So, fitting the model line
Rot !bin !TOT=Total !AODEV ~ mu SEX GRP SEX.GRP !r SIRE .2632 !GF}
produces the Analysis of Deviance
Analysis of Deviance Table for Rot
Source of Variation df Deviance Derived F
SEX 1 0.02 0.021
GRP 3 4.35 1.833
SEX.GRP 3 1.16 0.487
Deviance from GLM fit 48 37.96
Variance heterogeneity factor [Deviance/DF] 0.79
The Deviance is the deviance calculated from the
binomial part of the log-likelihood. This is distinct from the
log-likelihood obtained by the REML algorithm which pertains
to the working variable. Since the working variable changes
with the model fitted, the LogL values are not comparable between
models.
The heterogeneity factor is the Deviance / df
and gives some indication as to how well the discrete distribution
fits the data. A value greater than 1 suggests the data is over-dispersed,
that is the data values are more variable than expected
under the chosen distribution.
There is also a !DISPERSION [d] qualifier. If d is supplied, it
serves as a scaling factor for the weights in the analysis, changing the
reported variances and standard deviations. If d is not supplied,
it is estimated from the residual as the model is fitted to the working variable.
ASReml solves for the linear effects twice (see the !GLMM qualifier)
iteration of the variance components so that the variance component updates
are based on solutions obtained using the same variance parameters.
I.e. We start with a set of solutions and some parameters.
We use these to update the solutions. Then use the updated solutions
to update the variance parameter.
Bivariate analysis of binomial Foot score and Normal weight
The data file BINNOR.txt contains the expanded version (2513 records) of
the lamb data from the previous example augmented with an extra
simulated variable YVar.
It was
created from the summarized data without knowing which actual individuals
had which combinations of trait values.
The binary variable Score1 indicates
whether all four feet are sound. The following code produces a bivariate
analysis of Score1 on the underlying logistic scale and YVar
on the Normal scale.
Lamb data from ARG thesis page 177-8
Year GRP 5 !V99=V2 !==4 !M 1
SEX SIRE !I
Score1
Score2 Scald !+V99 Rot !+V99
YVar
binnor.txt !skip 1 !ASUV !MAXIT 40
Score1 YVar !bin ~ Trait.SEX Trait.GRP !r Trait.SIRE
1 2 1
2513
2 0 US !GFPP
1 .01 0.25
Trait.SIRE 2
Trait 0 US 0.015 0.01 1.05
SIRE
There are several issues addressed in this code.
!ASUV is required, and if there had been any missing values
in the data, the fixed model term mv would also be required.
ASReml constructs the R matrix by scaling the reported matrix by the
binomial variance calculated from the fitted value of the binomial variate.
Consequently, to avoid over/under dispersion being also fitted,
the residual 'variance' for the binomial trait is fixed at 1.0
by giving its initial value as 1.0 and using the qualifier !GFPP.
The response variables must be listed before the qualifiers.
If written as \as{Score !BIN YVar}, YVar would be parsed as an argument to !BIN
rather than as a response variable.
Only one categorical response is permitted, and it must be specified first.
output follows.
Distribution and link: Binomial; Logit Mu=P=1/(1+exp(-XB))
V=Mu(1-Mu)/N
Warning: The LogL value is unsuitable for comparing GLM models
1 LogL=-894.974 S2= 1.0000 5014 df Dev/DF= 0.6196
2 LogL=-894.554 S2= 1.0000 5014 df Dev/DF= 0.6194
3 LogL=-890.600 S2= 1.0000 5014 df Dev/DF= 0.6178
4 LogL=-884.431 S2= 1.0000 5014 df Dev/DF= 0.6144
5 LogL=-885.759 S2= 1.0000 5014 df Dev/DF= 0.6109
6 LogL=-892.413 S2= 1.0000 5014 df Dev/DF= 0.6085
7 LogL=-896.969 S2= 1.0000 5014 df Dev/DF= 0.6077
8 LogL=-897.941 S2= 1.0000 5014 df Dev/DF= 0.6076
9 LogL=-897.962 S2= 1.0000 5014 df : 1 components restrained
10 LogL=-897.962 S2= 1.0000 5014 df Dev/DF= 0.6076
11 LogL=-897.961 S2= 1.0000 5014 df Dev/DF= 0.6076
Deviance from GLM fit 5014 3046.50
Variance heterogeneity factor [Deviance/DF] 0.61
- - - Results from analysis of Score1 YVar - - -
Notice: While convergence of the LogL value indicates that the model
has stabilized, its value CANNOT be used to formally test differences
between Generalized Linear (Mixed) Models.
Source Model terms Gamma Component Comp/SE % C
Residual UnStructured 2 1 -0.162615E-03 -0.162615E-03 -0.03 0 P
Residual UnStructured 2 2 0.255609 0.255609 35.20 0 P
Trait.SIRE UnStructured 1 1 0.166092 0.166092 2.73 0 U
Trait.SIRE UnStructured 2 1 0.330313E-02 0.330313E-02 0.07 0 U
Trait.SIRE UnStructured 2 2 0.303900 0.303900 3.76 0 U
Covariance/Variance/Correlation Matrix UnStructured Residual
1.000 -0.3216E-03
-0.1626E-03 0.2556
Covariance/Variance/Correlation Matrix UnStructured Trait.SIRE
0.1661 0.1470E-01
0.3303E-02 0.3039
Wald F statistics
Source of Variation NumDF DenDF-con F-inc F-con M P-con
11 Trait.SEX 2 NA 393.15 76.10 A NA
12 Trait.GRP 10 40.9 1993.52 1993.52 A <.001
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using numerical derivatives.
YVar data was artificially created and the SIRE variance is too large
represent purely genetic variance.
Ordinal analysis of Footrot score
By way of introduction to ordinal analysis in ASReml consider
the cheese data from page 175 of McCullagh and Nelder (1994).
Four cheeses were scored on a ninepoint scale by 52 tasters giving
Table 1 Response frequencies in a cheese tasting experiment
I | II | III | IV | V | VI | VII | VIII | IX | Total
| A | 0 | 0 | 1 | 7 | 8 | 8 | 19 | 8 | 1 | 52
| B | 6 | 9 | 12 | 11 | 7 | 6 | 1 | 0 | 0 | 52
| C | 1 | 1 | 6 | 8 | 23 | 7 | 5 | 1 | 0 | 52
| D | 0 | 0 | 0 | 1 | 3 | 7 | 14 | 16 | 11 | 52
| |
There are several ways of supplying the data for multinomial analysis.
In this case, totals in the 9 classes are supplied in a single grouped
response.
It is analysed using a multiple (8) threshold model as in
McCullagh and Nelder (1994) with the \ASReml\ code
McCullagh and Nelder Cheese example p 175
Cheese !A
Rating !G 9 Total
Cheese.txt
Rating !MULT 9 !CUM ~ Trait Cheese
PREDICT Cheese
where Cheese.txt contains the data laid out as in Table 1.
The model term Trait fits the thresholds and interpreting the model
as a threshold model implies it should not be interacted with
other terms. Nevertheless, sometimes an interaction is fitted.
Note that \ASReml\ does not have a procedure for multinomial data
which is not ordered (except as fitted with a log linear model),
nd fitting a bivariate analysis involving a multinomial trait
is not possible.
The output is
Univariate analysis of Rating
Summary of 4 records retained of 4 read
Model term Size #miss #zero MinNon0 Mean MaxNon0 StndDevn
1 Cheese 4 0 0 1 2.5000 4
2 Rating Variate 0 2 1.000 1.750 6.000 2.872
2 Rating Variate 0 2 1.000 2.500 9.000 4.359
2 Rating Variate 0 1 1.000 4.750 12.00 5.500
2 Rating Variate 0 0 1.000 6.750 11.00 4.193
2 Rating Variate 0 0 3.000 10.25 23.00 8.770
2 Rating Variate 0 0 6.000 7.000 8.000 0.8165
2 Rating Variate 0 0 1.000 9.750 19.00 8.221
2 Rating Variate 0 1 1.000 6.250 16.00 7.411
2 Rating Variate 0 2 1.000 3.000 11.00 5.354
3 Total 0 0 52.00 52.00 52.00 0.000
4 Trait 8
Forming 12 equations: 12 dense.
Initial updates will be shrunk by factor 0.010
Distribution and link: Cum. Multinomial; Logit P=1/(1+exp(-XB))
Warning: The LogL value is unsuitable for comparing GLM models
Notice: 1 singularities detected in design matrix.
1 LogL=-26.4243 S2= 1.0000 21 df Dev/DF= 0.3356
2 LogL=-26.4503 S2= 1.0000 21 df Dev/DF= 0.3376
3 LogL=-26.4506 S2= 1.0000 21 df Dev/DF= 0.3376
4 LogL=-26.4506 S2= 1.0000 21 df Dev/DF= 0.3376
5 LogL=-26.4506 S2= 1.0000 21 df Dev/DF= 0.3376
Deviance from GLM fit 21 20.31
Variance heterogeneity factor [Deviance/DF] 0.97
- - - Results from analysis of Rating - - -
Notice: While convergence of the LogL value indicates that the model
has stabilized, its value CANNOT be used to formally test differences
between Generalized Linear (Mixed) Models.
Source Model terms Gamma Component Comp/SE % C
Wald F statistics
Source of Variation NumDF F-inc
4 Trait 8 17.45
1 Cheese 3 38.38
Warning: These Wald F statistics are based on the working variable and are
not equivalent to an Analysis of Deviance. Standard errors are scaled
by the variance of the working variable, not the residual deviance.
Finished: 17 Jun 2008 13:19:51.484 LogL Converged
Reverting to the collapsed lamb data, the two response variables FS1
and FS2 contain counts of the lambs with all feet sound, and with
one foot deformed, respectively. The count for those with two or more deformed
is given by difference from Total. A threshold model analysis
of this data is given by the model line
FS1 FS2 !mult 3 !TOTAL=Total ~ Trait SEX GRP !r SIRE
output
Notice: 1 singularities detected in design matrix.
1 LogL=-105.631 S2= 1.0000 129 df Dev/DF= 1.082
2 LogL=-105.632 S2= 1.0000 129 df Dev/DF= 1.082
3 LogL=-105.631 S2= 1.0000 129 df Dev/DF= 1.081
4 LogL=-105.628 S2= 1.0000 129 df Dev/DF= 1.080
5 LogL=-105.627 S2= 1.0000 129 df Dev/DF= 1.079
6 LogL=-105.627 S2= 1.0000 129 df Dev/DF= 1.078
Deviance from GLM fit 129 139.09
Variance heterogeneity factor [Deviance/DF] 1.08
- - - Results from analysis of FS1 FS2 - - -
Notice: While convergence of the LogL value indicates that the model
has stabilized, its value CANNOT be used to formally test differences
between Generalized Linear (Mixed) Models.
Source Model terms Gamma Component Comp/SE % C
SIRE 34 34 0.174697 0.174697 2.80 0 P
Wald F statistics
Source of Variation NumDF DenDF F-inc Prob
11 Trait 2 77.8 405.40 <.001
3 SEX 1 129.0 5.61 0.020
2 GRP 4 30.0 8.03 <.001
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using numerical derivatives.
Warning: These Wald F statistics are based on the working variable and are
not equivalent to an Analysis of Deviance. Standard errors are scaled
by the variance of the working variable, not the residual deviance.
Solution Standard Error T-value T-prev
2 GRP
2 -0.727155 0.273336 -2.66
3 -1.76491 0.356573 -4.95 -2.93
4 -1.19399 0.273168 -4.37 1.61
5 -0.915605 0.242677 -3.77 1.16
3 SEX
1 -0.197719 0.856093E-01 -2.31
11 Trait
1 1.54993 0.200125 7.74
2 3.82051 0.216314 17.66 27.12
4 SIRE 34 effects fitted
Finished: 18 Jun 2008 12:35:09.062 LogL Converged
Return to start