Multivariate Analysis

Introduction

Multivariate analysis is used here in the narrow sense of a multivariate mixed model. There are many other multivariate analysis techniques which are not covered by ASReml. Multivariate analysis is used when we are interested in estimating the correlations between distinct traits (for example, fleece weight and fibre diameter in sheep) and for repeated measures of a single trait.

Repeated measures (rats)

There are two basic forms of analysis of repeated measures data: Random regression type models and multivariate models. The latter described her apply when there are a limited number of repeat measures and they are taken on each subject at the same times so that the data has a multivariate structure.

Wolfinger (1996) summarises a range of variance structures that can be fitted to repeated measures data and demonstrates the models using five weights taken weekly on 27 rats subjected to 3 treatments.

Multiple traits: Wether trial data

Three key traits for the Australian wool industry are the weight of wool grown per year, the cleanness and the diameter of that wool. Much of the wool is produced from wethers and most major producers have traditionally used a particular strain or 'bloodline'. The file wether.as specifies a bivariate analysis.

Model specification

The syntax for specifying a multivariate linear model in ASReml is
Y-variates ~ fixed [ !r random ] [ !f sparse_fixed ]
where
• Y-variates is a list of traits,
• fixed, random and
• sparse_fixed are as in the univariate case but involve the special term Trait and interactions with Trait

The design matrix for Trait has a level (column) for each trait.
• Trait by itself fits the mean for each variate,
• In an interaction
•      Trait.Fac fits the factor Fac for each variate and
Trait.Cov fits the covariate Cov for each variate.

ASReml internally rearranges the data so that n data records containing t traits each becomes n sets of t analysis records indexed by the internal factor Trait i.e. nt analysis records ordered Trait within data record. If the data is already in this long form, use the !ASMV t qualifier to indicate that a multivariate analysis is required.

Variance structures

A more sophisticated error structure is required for multivariate analysis. Consider a multivariate analysis with t traits and n units in which the data are ordered traits within units. An typical variance structure is to assume units are independent and traits are correlated. This is described as the direct product of an IDENTITY matrix and an unstructured ( US ) variance matrix.

We discuss the syntax with reference to the following bivariate example
``` Orange Wether Trial 1984-8
SheepID !I
TRIAL
BloodLine !I
TEAM *
YEAR *
GFW YLD FDIAM
wether.dat !skip 1

GFW FDIAM ~ Trait Trait.YEAR,        # Fixed model
!r Trait.TEAM Trait.SheepID # Random model

predict YEAR Trait

1 2 2                                # Variance header
1485 0 ID                            # units structure
Trait 0 US                           # traits structure
3*0

Trait.TEAM 2                         # First G header
Trait 0 US !GP
3*0
TEAM 0 ID

Trait.SheepID 2                      # Second G header
Trait 0 US !GP
3*0
SheepID 0 ID
```

R-structure

For a standard multivariate analysis
• the error (R) structure for the residual must be
• specified as two-dimensional with
independent records and
an unstructured variance matrix across traits;
• records may have observations missing in different patterns and
• these are handled internally during analysis,
• the R structure must be ordered
• traits within units, that is, the R structure definition line for units must be specified before the line for Trait ,
• variance parameters are variances
• not variance ratios,
• the R structure definition line for units,
• that is,
1485 0 ID, could be replaced by
0 or
0 0 ID ; this tells ASReml to fill in the number of units and is a useful option when the exact number of units in the data is not known to the user,
• the error variance matrix for traits is specified by the model
•      Trait 0 US
3 * 0
Three initial values for the matrix are required being the lower triangle of the (symmetric) matrix specified row-wise.
Finding reasonable initial values can be a problem. If initial values are written on the next line in the form      q * 0 where q is t(t + 1)/2 and t is the number of traits, as in the example,
ASReml will take half of the phenotypic variance matrix of the data as an initial value.

!ASUV and !ASMV

These special qualifiers relating to multivariate analysis allow for the situation when
• !ASUV: the data is in a multivariate layout but some residual variance structure other than IDENTITY cross US is required.
• !ASMV t the data (file) is already in an expanded form (n sets of t records and the multivariate residual variance structure IDENTITY cross US IS required.
• To use an error structure other than
• US for the residual stratum you must (also) specify !ASUV on the datafile line and include mv in the model if there are missing values,
• To perform a multivariate analysis (including the automatic
• handling of missing values) when the data have already been expanded use !ASMV t on the datafile line.      t is the number of traits that ASReml should expect,     the data file must have t records for each multivariate record although some may be coded missing.

G-structure

For a standard multivariate analysis, a US structure is also used for the between trait variance matrix of the random terms (as in the example). However, other structured models may be used and may be necessary when there are more traits as it is not unusual for there not to be a positive definite solution for US matrices.     Note the use of !GP to request the estimated matrix be constrained to be positive definite, and
the use of 3*0 in lieu of estimates of initial values; ASReml again substitutes a proportion of the observed variance covariance matrix of the data.

Example

Below is the output returned in the .asr file for this analysis, except that the !GO qualifiers were omitted.
```  ASReml 1.63o [01 Jun 2005]  Orange Wether Trial  1984-88
Build: j [01 Jul 2005]  32 bit
13 Jul 2005 09:38:00.928   32.00 Mbyte Windows   wether

Folder: C:\data\asr\UG2\manex
TAG  !I
BloodLine !I
QUALIFIERS: !SKIP 1
Reading wether.dat  FREE FORMAT skipping     1 lines

Bivariate analysis of GFW and FDIAM
Using     1485 records of    1485 read
Model term                  Size #miss #zero   MinNon0    Mean      MaxNon0
1 TAG                       521     0     0      1   261.0956        521
2 TRIAL                             0     0  3.000      3.000      3.000
3 BloodLine                  27     0     0      1    13.4323         27
4 TEAM                       35     0     0      1    18.0067         35
5 YEAR                        3     0     0      1     2.0391          3
6 GFW                  Variate      0     0  4.100      7.478      11.20
7 YLD                               0     0  60.30      75.11      88.60
8 FDIAM                Variate      0     0  15.90      22.29      30.60
9 Trait                       2
10 Trait.YEAR                  6  9 Trait     :   2   5 YEAR           :    3
11 Trait.TEAM                 70  9 Trait     :   2   4 TEAM           :   35
12 Trait.TAG                1042  9 Trait     :   2   1 TAG            :  521
1485  identity
2  UnStructure    0.2000    0.2000    0.4000
2970 records assumed sorted    2 within    1485
2  UnStructure    0.4000    0.3000    1.3000
35  identity
Structure for Trait.TEAM         has      70 levels defined
2  UnStructure    0.2000    0.2000    2.0000
521  identity
Structure for Trait.TAG          has    1042 levels defined
Forming    1120 equations:   8 dense.
Initial updates will be shrunk by factor    0.316
Notice: Algebraic ANOVA Denominator DF calculation is not available
Empirical derivatives will be used.
NOTICE:      2 singularities detected in design matrix.
1 LogL=-886.521     S2=  1.0000       2964 df
2 LogL=-818.508     S2=  1.0000       2964 df
3 LogL=-755.911     S2=  1.0000       2964 df
4 LogL=-725.374     S2=  1.0000       2964 df
5 LogL=-723.475     S2=  1.0000       2964 df
6 LogL=-723.462     S2=  1.0000       2964 df
7 LogL=-723.462     S2=  1.0000       2964 df
8 LogL=-723.462     S2=  1.0000       2964 df

Source                Model  terms     Gamma     Component    Comp/SE
\verb
Residual            UnStru   2   1  0.128890      0.128890      12.40   0 U
Residual            UnStru   2   2  0.440601      0.440601      21.93   0 U
Trait.TEAM          UnStru   1   1  0.374493      0.374493       3.89   0 U
Trait.TEAM          UnStru   2   1  0.388740      0.388740       2.60   0 U
Trait.TEAM          UnStru   2   2   1.36533       1.36533       3.74   0 U
Trait.TAG           UnStru   1   1  0.257159      0.257159      12.09   0 U
Trait.TAG           UnStru   2   1  0.219557      0.219557       5.55   0 U
Trait.TAG           UnStru   2   2   1.92082       1.92082      14.35   0 U
Covariance/Variance/Correlation Matrix UnStructured
0.4360 is the correlation Trait.TEAM
0.1984     0.4360
0.1289     0.4406
Covariance/Variance/Correlation Matrix UnStructured
0.3745     0.5436
0.3887      1.365
Covariance/Variance/Correlation Matrix UnStructured
0.2572     0.3124
0.2196      1.921

Wald F statistics
Source of Variation           NumDF     DenDF    F-inc             Prob
9 Trait                             2      33.0  5761.58            <.001
10 Trait.YEAR                        4    1162.2  1094.90            <.001
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using empirical derivatives.

Solution       Standard Error    T-value     T-prev
10 Trait.YEAR
2  -0.102262       0.290190E-01     -3.52
3    1.06636       0.290831E-01     36.67     42.07
5    1.17407       0.433905E-01     27.06
6    2.53439       0.434880E-01     58.28     32.85
9 Trait
1    7.13717       0.107933         66.13
2    21.0569       0.209095        100.71     78.16
11 Trait.TEAM                           70 effects fitted
12 Trait.TAG                          1042 effects fitted
SLOPES FOR LOG(ABS(RES)) on LOG(PV) for Section   1
1.00   1.54
10  possible outliers: see .res file
Finished: 13 Jul 2005 09:38:05.725   LogL Converged
```