The datafile

Introduction

The first step in an ASReml analysis is to prepare the data file.

The standard format of an ASReml data file is to have the data arranged in columns/fields with a single line for each sampling unit. The columns contain variates and covariates (numeric), factors (alphanumeric), traits (response variables) and weight variables in any order that is convenient to the user. The data file may be free format, fixed format or a binary file.

Free format data files

The data are read free format ( SPACE, COMMA or TAB separated) unless the file name has extension .bin for real binary, or .dbl for double precision binary (see below). Important points to note are as follows:

blank lines are ignored,

column headings, field labels or

comments may be present at the top of the file provided that the !skip qualifier is used to skip over them,

NA, * and . are treated as coding for missing values in free format data files;

if missing values are coded with a

unique data value (for example, 0 or -9), use !M to flag them as missing or !D to drop the data record containing them

Comma separated values

You may use

Excel to prepare your data file as a comma delimited file

comma delimited files whose file name ends in

.csv or for which the !CSV qualifier is set recognise empty fields as missing values,

a line beginning with a comma implies a preceding missing value,

consecutive commas imply a missing value,

a line ending with a comma implies a trailing missing value,

if the filename does not end in

.csv or the !CSV qualifier is not set, commas are treated as white space,

General comments

characters

# on a line are ignored so this character may not be used in alphanumeric fields,

blank spaces, tabs and commas must not

be used (embedded) in alphanumeric fields unless the label is enclosed in quotes, for example, the name Willow Creek would need to be appear in the data file as `Willow Creek' to avoid error,

the

$ symbol must not be used in the data file,

alphanumeric fields have a default size of

16 characters. Use the !LL qualifier to extend the size of factor labels stored.

extra data fields on a line are ignored,

if there are fewer data items on a line

than ASReml expects the remainder are taken from the following line(s) except in .csv files were they are taken as missing. If you end up with half the number of records you expected, this is probably the reason,

all lines beginning with

! followed by a blank are copied to the .asr file as comments for the output; their contents are ignored,

a data file line may not exceed 2000 characters; if the data \new

fields will not fit in 2000 characters, put some on the next line.

Fixed format files

The format must be supplied with the !FORMAT qualifier. However, if all fields are present and are separated, the file can be read free format.

Multiple data files

Sometimes data is split over several files. In the case where the separate files relate to say separate experiments in a series of similar experiments and a combined analysis is required, the data files can be combined through !INCLUDE statements.

Binary format data files

Conventions for binary files are as follows:

binary files are read as unformatted

Fortran binary in single precision if the filename has a .bin or .BIN extension,

Fortran binary data files are read in

double precision if the filename has a .dbl or .DBL extension,

ASReml recognises the value

-1e37 as a missing value in binary files,

Fortran binary in the above means all real (

.bin ) or all double precision ( .dbl ) variables; mixed types, that is, integer and alphabetic binary representation of variables is not allowed in binary files,

binary files can only be used in

conjunction with a pedigree file if the pedigree fields are coded in the binary file so that they correspond with the pedigree file (this can be done using the !SAVE qualifier to form the binary file), or the identifiers are whole numbers less than 9,999,999 and the !RECODE qualifier is specified.

Example

This data file has three fields and a header line identifying them. The heading line is not used by ASReml and must be skipped when the file is read.

 Source SeedZn LeafZn
  1 61 24.1
  1 63 23.8
  2 51 16.0
  2 64 19.0
  6 69 22.6
  6 75 27.9
  6 93 24.6
  5 85 31.3
  5 86 35.4
  5 80 20.9
  7 47 13.9
  7 49 14.0
  7 57 17.3
  8 50 10.8
  8 48 12.3
  8 46 13.9
 11 69 26.8
 11 79 31.7
 12 64 22.5
 12 68 24.2
 13 48 13.4
 13 66 15.1
 13 53 14.1
 14 39 11.7
 14 40 11.5
 14 45 12.3
 17 63 24.8
 17 64 25.0
 17 70 21.4
 18 63 28.2
 18 61 23.0
 19 36 11.0
 19 29 10.2
 19 29 10.9
 21 57 18.6
 21 68 21.2
 21 61 18.2
 24 84 25.2
 24 64 25.1

Return to start