The data fields are defined immediately after the job
title. They tell ASReml how many fields to expect in the data file and
what they are. No more than 10,000 variables may be read or formed.
Data field definitions
should be given for all fields in the data file; data fields on the end of a data line that do not have a corresponding field definition will be ignored,
must be presented in the order in which they appear in the data file,
must be indented one or more spaces,
can appear with other definitions on the same line,
data fields can be transformed as they are defined (see below),
additional data fields can be created by transformation; these
should be listed after the data fields read from the data file.
Syntax
Usually there will be a field definition for every data field.
For example, field definitions typical of a simple randomised block
might be
Randomised Block Experiment # Title Line
Blocks * # coded 1...
Treatments !A # alphabetic names
yield # response variable
rcb.dat # data file
yield ~ mu Treatments !r Blocks # model line
field definitions appear in the ASReml command file in the form
has a maximum of 31 characters of which only 20 are
printed; the remaining characters are not displayed,
must begin with a letter,
must not contain the special characters ., *, :, /,
!, #, | or ( ,
names of predefined
model terms
and
variance structures
must not be used,
FieldType
defines how a variable is interpreted as it is read and whether it is
as a factor or variable if specified in the linear model,
for a simple variate, leave FieldType
blank or specify
1,
for a model factor, various qualifiers are required
depending on the form of the factor coding
where
n
is the number of levels of the
factor and
s
is a list of labels to be assigned to the levels:
*
or
n
is used when the data field has values
1... n
directly coding for the factor unless the levels are to be labelled
(see
!L
), for example Row *
!A [n]
is required if the data field is alphanumeric;
n
must be specified if more than 2000 level names are present,
for example Location !A,
!I [n]
is required if the data is numeric but not
1... n
;
n
must be specified if more than 1000 codes are present,
for example Year !I,
!AS [n]
is required if the
data field is similar to a previous !A
or !I factor
p
and is to be coded identically,
for example in a plant diallel experiment Male !A 22 Female !AS Male # integrated coding,
!L s
is used when the
data field is numeric with values
1... n
and
labels are to be assigned to the
n
levels, for example
Sex !L Male Female
If there are many labels, they may be written over several lines by using a
trailing comma to indicate continuation of the list.
!P
indicates the special case of a pedigree factor;
ASReml will determine the levels from the pedigree file
In all these, a warning is printed if the nominated value for n does not agree with the actual number
of levels found in the data and
if the nominated value is too small the correct value is used.
!G m [n]
is used when
m
contiguous data fields are to be
treated as a set or group of variates (n omitted or 1) or factor variables (n>1). For example
:
X1 X2 X3 X4 X5 y
data.dat
y ~ mu X1 X2 X3 X4 X5
can be expressed as
:
X !G 5 y
data.dat
y ~ mu X
so that the 5 variates can be referred to in the model as
X
by using
X !G 5
Date and Time fields
!DATE
specifies the field has one of the date formats dd/mm/yy, dd/mm/ccyy,
dd-Mon-yy or dd-Mon-ccyy
and is to be converted into a Julian day
dd is a 1 or 2 digit day of the month, mm
is a 1 or 2 digit month of the year, Mon is a three letter month name (
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
),
yy is the year within the century (00 to 99), cc is the century (19 or 20).
The separators '/' and '-' must be present as indicated.
The dates are converted to days since 1899.
When the century is not specified, yy of 0-32 is taken as 2000-2032, 33-99 taken as 1933-1999.
!DMY
specifies the field has one of the date formats
dd/mm/yy or dd/mm/ccyy
and is to be converted into a Julian day.
!MDY
the field has one of the date formats mm/dd/yy or mm/dd/ccyy
and is to be converted into a Julian day.
!TIME
specifies the field has one of the format
hh:mm:ss
and is to be converted into
seconds past midnight where
hh is hours (0 to 23), mm is minutes (0-59)
and ss is seconds (0 to 59). The separator ':' must be present as indicated.
Storage of alphabetic factor labels
Space is
allocated dynamically for the storage of alphabetic factor
labels with a default allocation being 2000 labels
of 16 characters long. If there are large
!A
factors (so that the
total across all factors will exceed 2000), you must specify the
anticipated size (within say 5%).
If some labels are longer then
16 characters and the extra characters are significant, you must
lengthen the space for each label by specifying
!LL c
e.g.
cross !A 2300 !LL 48
indicates the factor
cross
will have about 2300 levels and needs
48 characters to hold the level names.
Note that only the
first 20 characters of the labels are ever printed.
!PRUNE
on a field definition line means that if fewer levels
are actually present in the factor than were declared,
will reduce
the factor size to the actual number of levels.
Use
!PRUNALL
for this action to be taken on the current and
subsequent factors up to (but not including) a factor with the
!PRUNEOFF
qualifier.
The user may overestimate the size for large ALPHA and INTEGER
coded factors so that ASReml reserves enough space for the list.
Using
!PRUNE
will mean the extra (undefined) levels will not appear
in the
.sln
file. Since it is sometimes necessary that factors not
be pruned in this way, for example in pedigree/GIV factors,
pruning is only done if requested.
Reordering the factor levels
!SORT
declared after
!A
or
!I
on a field definition line will
cause ASReml to sort the levels so that labels occur in
alphabetic/numeric order for the analysis.
As ASReml reads the data file, it encodes !I and !A
factor levels in the order they appear in the data
so that for example, the user cannot tell whether SEX
will be coded 1=Male, 2=Female or 1=Female, 2=Male
without looking at the data file to see whether
Male or Female appears first in the SEX field.
If !SORT is specified, ASReml creates a lookup table after reading the data to select levels
in sorted order and uses this sorted order when forming the design matrices.
Consequentially, with the !SORT qualifier,
the order of fitted effects will be 1=Female, 2=Male in the analysis
regardless of which appears first in the file.
This can lead to some confusion because some other operations will be applied to the unsorted order.
In particular any transformations are
performed as the data is read in and before the sorting occurs.
!SORTALL
means that the levels for the current and subsequent
factors are to be sorted.
Skipping input fields
!SKIP f
will skip
f
data fields BEFORE reading this field.
is particularly useful in large files with alphabetic fieldsare not needed as it saves ASReml the time required tothe alphabetic labels. For example
Sire !I !skip 1
skip the field before the field which is read as 'Sire'.