Merging two data files
The MERGE directive, described in this chapter,
is designed to combine information
from two files into a third file with a range of qualifiers
to accomodate various scenarios.
developed with assistance from Chandrapal Kailasanathan
to replace the !MERGE qualifier (described below)
which had very limited functionality.
The MERGE directive is placed BEFORE
the data filename lines.
It is an independent part of the ASReml job in the sense that
none of the files are necessarily involved in the subsequent analyses
performed by the job, and there may be multiple
MERGE directives. Indeed, the job may just consist of
a title line and MERGE directives.
The !MERGE qualifier, on the other hand, combines information from
two files into the internal data set which ASReml uses for analysis
and does not save it to file.
It has very limited in functionality.
The files to be merged
must conform to the following basic structure:
the data fields must be TAB, COMMA or SPACE separated,
there will be one
heading line that names the columns in the file,
the names may not have embedded spaces,
the number of fields is determined from the number of names,
are implied by adjacent commas in comma delimited files. Otherwise,
they are indicated by NA, * or . as in normal ASReml files.
the merged file will be TAB separated if a .txt file,
COMMA separated if a .csv file and SPACE separated otherwise.
The basic merge command is
MERGE file1 !WITH file2 !TO newfile .
Typically files to be merged will have common key
fields. In the basic merge, ( !KEY not specified)
any fields having the same names are taken as the key fields and
if the files have no fields in common, they are assumed to match on row number.
Fields are referenced by name (case sensitive).
The full command is:
MERGE file1 [ !KEY keyfields ]
[ !KEEP ] [ !SKIP fields ]
!WITH file2 [ !KEY keyfields ]
[ !KEEP ] [ !NODUP ] [ !SKIP fields ]
!TO newfile [ !CHECK ] [ !SORT ].
Fields in the merged file will be arranged with
key fields followed by other fields from the primary file
and then fields from the secondary file.
List of MERGE qualifiers
requests ASReml confirm that fields having a common name have the same contents.
are reported to the .asr file. If there are fields
with common names which are not key fields,
and !CHECK is omitted, the
will be assumed different and both versions will be copied.
names the fields which are to be used for matching records in the files.
If the fields have the same name in both file headers, they need
only be named in association with the primary input file.
If the key fields are the only fields with common names, the
!KEY qualifier may be omitted altogether.
If key fields are not nominated and there are no common field names,
the files are interleaved.
instructs ASReml to include in the merged file records from the input file
which are not matched in the other input file. Missing values are inserted
as the values from the other file. Otherwise, unmatched records are discarded.
!KEEP may be specified with either or both input files.
Typically when a match occurs, the field contents from the second
file are combined with the field contents of the first file to produce the
merged file. The !NODUP qualifier, which may only be associated with the second file, causes the field contents for the nominated fields
from the second file only be inserted once into the merged file.
For example, assume we want to merge two files containing data from
sheep. The first file has several records per animal containing
fleece data from various years. The second file has one record per animal
containing birth and weaning weights. Merging with !NODUP bwt wwt
will copy these traits only once into the merged file.
is used to exclude fields from the merged file.
It may be specified with either or both input files.
instructs ASReml to produce the merged file sorted on the key fields.
Otherwise the records are return in the order they appear in the
The merging algorithm is briefly as follows: The secondary file
is read in, skip fields being omitted, and the records
are sorted on the key fields. If sorted output is
required, the primary file is also read in and sorted.
The primary file (or its sorted form) is then processed line by line
and the merged file is produced. Matching of key fields is on
a string basis, not a value basis.
If there are no key fields, the files are
merged by interleaving.
If there are multiple records
with the same key, these are severally matched. That is
if 3 lines of file 1 match 4 lines of file 2, the merged
file will contain all 12 combinations.
Key fields have different names
!MERGE file1 !Key key1a key1b !WITH file2 !KEY key2a key2b !to newfile
Key fields have common name and other fields are also duplicated
!MERGE file1 !Key keya keyb !WITH file2 !to newfile !CHECK}
!MERGE file1 !Key key !KEEP !WITH file2 !to newfile
will discard records from file2 that do not match
records in file1
but all records in file1 are retained.
Omitting fields from the merged file
!MERGE file1 !Key key !skip s1a s1b !WITH file2 !skip s2a s2b !to newfile
Single insertion merging
!MERGE adult.txt !Key ewe !KEEP !WITH birth.txt !KEEP !TO newfile !NODUP bwt
!MERGE c f
!SKIP n !MATCH a b
may be specified on a line following the datafile
The purpose is to combine data fields from the (primary) data file
with data fields from a secondary file (f).
The effect is to open the named file (skip n lines)
and then insert the columns from the new file into field
positions starting at position c.
a b is specified, ASReml checks that the field
a (0< a < c) has the same value as field b.
If not, it is assumed that the merged file has some missing records
and missing values are inserted into the data record and the line
file is kept for comparison with the next record.
At this stage it is expected that the lines in the
are in the same order as the corresponding lines occur in the
primary data file, and that there are no extraneous lines in the
file. (It is proposed to extend this so the orders do not need
to agree and that multiple lines in the primary file could
be merged with the same line of the
For example, assuming the field definitions define 10 fields,
PRIMARY.DAT !skip 1
!MERGE 6 SECOND.DAT !SKIP 1 !MATCH 1 6
would obtain the first five fields from PRIMARY.DAT and the next five
from SECOND.DAT, checking that the first field in each file has the
Thus each input record is obtained by combining information from
each file, before any transformations are performed.
Return to start