The data fields are defined immediately after the job
title. They tell ASReml how many fields to expect in the data file and
what they are. No more than 10,000 variables may be read or formed.
Data field definitions
should be given for all fields in the data file; data fields on the end of a data line that do not have a corresponding field definition will be ignored,
must be presented in the order in which they appear in the data file,
must be indented one or more spaces,
can appear with other definitions on the same line,
data fields can be transformed as they are defined (see below),
additional data fields can be created by transformation; these
should be listed after the data fields read from the data file.
Usually there will be a field definition for every data field.
For example, field definitions typical of a simple randomised block
Randomised Block Experiment # Title Line
Blocks * # coded 1...
Treatments !A # alphabetic names
yield # response variable
rcb.dat # data file
yield ~ mu Treatments !r Blocks # model line
field definitions appear in the ASReml command file in the form
a leading SPACE is required on every line
a LABEL for the data field
[ FieldType ]
is an alphanumeric string to identify the field,
has a maximum of 31 characters of which only 20 are
printed; the remaining characters are not displayed,
must begin with a letter,
must not contain the special characters ., *, :, /,
!, #, | or ( ,
names of predefined
must not be used,
defines how a variable is interpreted as it is read and whether it is
as a factor or variable if specified in the linear model,
for a simple variate, leave FieldType
blank or specify
for a model factor, various qualifiers are required
depending on the form of the factor coding
is the number of levels of the
is a list of labels to be assigned to the levels:
is used when the data field has values
directly coding for the factor unless the levels are to be labelled
), for example Row *
is required if the data field is alphanumeric;
must be specified if more than 2000 level names are present,
for example Location !A,
is required if the data is numeric but not
must be specified if more than 1000 codes are present,
for example Year !I,
is required if the
data field is similar to a previous !A
or !I factor
and is to be coded identically,
for example in a plant diallel experiment Male !A 22 Female !AS Male # integrated coding,
is used when the
data field is numeric with values
labels are to be assigned to the
levels, for example
Sex !L Male Female
If there are many labels, they may be written over several lines by using a
trailing comma to indicate continuation of the list.
indicates the special case of a pedigree factor;
ASReml will determine the levels from the pedigree file
In all these, a warning is printed if the nominated value for n does not agree with the actual number
of levels found in the data and
if the nominated value is too small the correct value is used.
!G m [n]
is used when
contiguous data fields are to be
treated as a set or group of variates (n omitted or 1) or factor variables (n>1). For example
X1 X2 X3 X4 X5 y
y ~ mu X1 X2 X3 X4 X5
can be expressed as
X !G 5 y
y ~ mu X
so that the 5 variates can be referred to in the model as
X !G 5
Date and Time fields
specifies the field has one of the date formats dd/mm/yy, dd/mm/ccyy,
dd-Mon-yy or dd-Mon-ccyy
and is to be converted into a Julian day
dd is a 1 or 2 digit day of the month, mm
is a 1 or 2 digit month of the year, Mon is a three letter month name (
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
yy is the year within the century (00 to 99), cc is the century (19 or 20).
The separators '/' and '-' must be present as indicated.
The dates are converted to days since 1899.
When the century is not specified, yy of 0-32 is taken as 2000-2032, 33-99 taken as 1933-1999.
specifies the field has one of the date formats
dd/mm/yy or dd/mm/ccyy
and is to be converted into a Julian day.
the field has one of the date formats mm/dd/yy or mm/dd/ccyy
and is to be converted into a Julian day.
specifies the field has one of the format
and is to be converted into
seconds past midnight where
hh is hours (0 to 23), mm is minutes (0-59)
and ss is seconds (0 to 59). The separator ':' must be present as indicated.
Storage of alphabetic factor labels
allocated dynamically for the storage of alphabetic factor
labels with a default allocation being 2000 labels
of 16 characters long. If there are large
factors (so that the
total across all factors will exceed 2000), you must specify the
anticipated size (within say 5%).
If some labels are longer then
16 characters and the extra characters are significant, you must
lengthen the space for each label by specifying
cross !A 2300 !LL 48
indicates the factor
will have about 2300 levels and needs
48 characters to hold the level names.
Note that only the
first 20 characters of the labels are ever printed.
on a field definition line means that if fewer levels
are actually present in the factor than were declared,
the factor size to the actual number of levels.
for this action to be taken on the current and
subsequent factors up to (but not including) a factor with the
The user may overestimate the size for large ALPHA and INTEGER
coded factors so that ASReml reserves enough space for the list.
will mean the extra (undefined) levels will not appear
file. Since it is sometimes necessary that factors not
be pruned in this way, for example in pedigree/GIV factors,
pruning is only done if requested.
Reordering the factor levels
on a field definition line will
cause ASReml to sort the levels so that labels occur in
alphabetic/numeric order for the analysis.
As ASReml reads the data file, it encodes !I and !A
factor levels in the order they appear in the data
so that for example, the user cannot tell whether SEX
will be coded 1=Male, 2=Female or 1=Female, 2=Male
without looking at the data file to see whether
Male or Female appears first in the SEX field.
If !SORT is specified, ASReml creates a lookup table after reading the data to select levels
in sorted order and uses this sorted order when forming the design matrices.
Consequentially, with the !SORT qualifier,
the order of fitted effects will be 1=Female, 2=Male in the analysis
regardless of which appears first in the file.
This can lead to some confusion because some other operations will be applied to the unsorted order.
In particular any transformations are
performed as the data is read in and before the sorting occurs.
means that the levels for the current and subsequent
factors are to be sorted.
Skipping input fields
data fields BEFORE reading this field.
is particularly useful in large files with alphabetic fields
are not needed as it saves ASReml the time required to
the alphabetic labels. For example
Sire !I !skip 1
skip the field before the field which is read as 'Sire'.
Return to start