RHMAPPER, Installation and User's Guide

Version 1.22, January 10, 1998

- Lincoln Stein < lstein@genome.wi.mit.edu>

Introduction
- Abstract
- Features
Installation
Using RHMAPPER
Using the place_markers Script
License Agreement
References
Appendix: The Genebridge 4 Hybrid Panel Order

RHMAPPER is a software package for creating radiation hybrid maps [1, 2]. It was designed to accommodate very large scale radiation hybrid maps, and has been used successfully to create whole genome maps containing thousands of markers at the Whitehead Institute/MIT Center for Genome Research [3].

Features

Maximum Likelihood Calculations on Diploid RH Panels

RHMAPPER uses a hidden Markov model (HMM) [4] to perform maximum likelihood calculations on multipoint maps. It can be used on data from radiation hybrid panels derived from diploid cells, such as those used in whole genomic mapping. It can also be used for haploid panels.

Accomodation of Laboratory Error

A central problem with radiation hybrid mapping is laboratory error. Inevitable errors in scoring radiation hybrid results will lead, in the best case, to map expansion. The maximum likelihood model used by RHMAPPER accomodates laboratory errors by assigning a priori false positive and negative rates, and allowing for such errors in the maximum likelihood calculation. This has the effect of minimizing map expansion (at the cost of some reduction in lod scores).

Detection of Laboratory Error

RHMAPPER can flag individual assays that have a high probability of being erroneous. The assays can then be rechecked experimentally.

Client Server Architecture

RHMAPPER is built on a client/server architecture. The server is written in optimized C and is responsible for the computation-intensive maximum likelihood calculation. The remainder of the software consists of a series of Perl scripts tied together by a simple front end. This architecture allows the map building routines to be modified in an incremental and experimental manner. In fact, the client and server can reside on separate machines.

Built-In Programming Language

90% of RHMAPPER is written in Perl. The front end is in fact just a Perl interpreter. You can type in commands and have them evaluated immediately (they're just predefined Perl subroutines), or you can create your own subroutines and macros, complete with all of Perl's control constructs and functions. RHMAPPER will also run in batch mode.

Installation

The source code is distributed, as a compressed tar archive at http://www-genome.wi.mit.edu/ftp/pub/software/rhmapper/ (World Wide Web), or via anonymous FTP at ftp-genome.wi.mit.edu, directory /pub/software/rhmapper.

Requirements

Hardware

RHMAPPER has been tested on the following systems:

Sun SPARC10 running SunOS 4.3
DEC Alpha series running OSF/1 2.0 and 3.0
Intel Pentium CPU running Linux 1.2.13 (command line editing doesn't work on this platform because of deficiencies in the prerelease Perl readline library).

Required Software

Perl

RHMAPPER requires Perl, version 5.003 or higher. Perl can be obtained from multiple locations including:

Although it is possible that the Windows and Macintosh ports of Perl will support RHMAPPER this has not been tested!

An ANSI C Compiler

You will need an ANSI C compiler to compile the server component of RHMAPPER, rhmapserver. The compiler that comes with SunOS is non-standard and is known to have problems compiling rhmapserver. The server compiles well with the freely-available gcc compiler, as well as with almost all other C compilers. You'll also need a C compiler to compile the BinHex utility described below.

Optional Software

The Perl 5 Term::Readline Library

If the Perl readline library is installed, the command-line interpreter will support the cursor keys, command completion, and cut-and-paste using Emacs key bindings. Perl 5.003 comes with the stubs for readline installed; however you'll need to install the full package, which includes a module called "readline.pm". A copy of the Readline library is included in this package.

Some architectures require you to install the Ioctl module in order for Term::Readline to work. That's included too.

The Berkeley DB_File Package

Perl5 comes with interface modules for several different DBM-style databases. RHMAPPER was originally designed to work with any of them, but because of performance problems with larger databases, it now only works with the "DB_File" interface. The DB_File interface is part of the perl5.003 distribution. You'll also need to compile and install the DB library binary, libdb. A copy of the library included in this distribution for your convenience. Further information is available at:

ftp://ftp.cs.berkeley.edu/ucb/4bsd/db.tar.Z

BinHex

RHMAPPER comes with an ancillary Perl script called draw_map.pl that can output a map in Macintosh Quickdraw format (chosen because it is a convenient form for biologists and is easily edited). A C program called BinHex is useful for converting the output of this program into a text-only format that is readily transferred from the Unix machine to a Macintosh via e-mail or FTP.

BinHex was written at the Whitehead, and is included in this package.

The NetPBM Utilities

The NetPBM utilities are a package of graphics conversion routines for Unix machines. One of these routines, picttoppm, can be used to convert Macintosh pictures into a variety of other formats, including Postscript and GIF. NetPBM is available at many locations, including:

ftp://ftp.x.org/R5contrib/netpbm-1mar1994.tar.gz There is a bug in the distribution of picttoppm that prevents the -fontdir switch from working properly (this is used to direct picttoppm to display certain X11 fonts for Macintosh fonts). A patch to picttoppm.c that fixes this bug is included in this package.

XV or the ImageMagick Utilities

XV and ImageMagick are freely-available XWindows utilities for displaying images. By creating a pipe that includes picttoppm and one of these displayers, RHMAPPER can pop up a window to display a map.

Locations:

XV: ftp://ftp.cis.upenn.edu/pub/xv/
ImageMagick: ftp://ftp.x.org/contrib/applications/ImageMagick/

Compiling the C Code

After you've unpacked the RHMAPPER distribution, cd into the rhmapper directory and edit the top level Makefile to suit your preferences. The following settings are defined:

CC: The C compiler to use
OPTS: The compiler options to use.
PERL: The path to Perl. During installation this path is prepended to the top of all Perl scripts.
INSTALLDIR: The directory where the rhmapper distribution will be installed.
INSTALL: The "install" program and any options.

When these values have been set to your liking type make and make install.

If you wish to install BinHex, you should now enter the BinHex directory and examine the Makefile, making the appropriate changes. You'll have to adjust one define to suit the endianness of your machine, as explained in the Makefile.

Adjusting the Perl Path

The front end rhmapper, and all the Perl scripts in the subdirectory bin assume that the Perl interpreter is located at /usr/local/bin/perl5. If Perl is located somewhere else, you'll have to change the path name on the top line of rhmapper and all scripts ending in the suffix .pl. Most sites use /usr/local/bin/perl instead of /usr/local/bin/perl5. It may be easiest just to create a link from perl5 to perl.

Adjusting the Settings in rhmapper.conf

During installation, make will copy of the template file, rhmapper.conf.in to the global configuration file rhmapper.conf. You should review the settings in rhmapper.conf and adjust them appropriately.

The settings are all PARAMETER=VALUE pairs. Whitespace before or after the = sign is ignored. Use single and double-quotes if you need to include whitespace within the value (shouldn't be necessary). The parameters are used to establish defaults when rhmapper starts up. Their values are accessible from within the interpreter, and they can be examined and set while the program is running.

Certain parameter values have special meanings:

~: The current user's home directory
~user: The home directory of user user
%u: The name of the current user.
%w: The name of the working directory (same as ".")
%r: The name of the rhmapper root directory (where the rhmapper executable is installed).

For example, you can indicate that the current user's databases are located under the rhmapper root in a directory called "user-DATA" with this configuration setting:

DATAFILES = %r/%u-DATA

Two new features in version 1.22 make it easier to deal with rhmapper's various settings. First, each user of the software may have a personal settings file, ordinarily stored in his home directory in a file named .rhmapper. Any or all parameters defined in the global settings file can be overriden in this personal file. This allows users to maintain their own private project databases, or for groups of users to share the same project database.

Secondly, project-specific settings, including the error rate, retention frequency and parameters that adjust the way the mapping software works, may be saved to the project database automatically when the user leaves the project, and restored when the user reopens the project. The settings are user-specific so that one user's preferences do not interfere with another's.

Both of these new options can be disabled if need be.

Parameters

USE_READLINE

Set this to 1 (or any other non-zero value) if you've installed the Term::Readline library and wish to use it.

DATAFILES

The files containing the RH data and maps are kept in a subdirectory below the top of the rhmapper tree. Ordinarily this directory is called DATA. Set this parameter to any absolute or relative path in order to change the default location.

USER_CONF

Each user can have a personal configuration file. The format of this configuration file is identical to the global file. Parameter=value pairs in the personal configuration file override those in the global file. Parameters not present in the personal configuration file are inherited. This parameter gives the path to the personal configuration file, usually ~/.rhmapper. To disable personal configuration files completely, comment out this parameter or set it to an empty value.

PERSISTENT_PARAMETERS

At the end of each session, and when switching from project to project, a variety of common project-specific parameters are saved to a database file named "Parameters". Project-specific parameters are always user-specific, even when the same global parameters database is in use. This setting gives the path to the directory in which this database file will be stored, usually the same as the DATAFILES directory. To disable persistent parameters, comment out this setting or set it to an empty value.

RHMAPSERVER

This is the path to the rhmapserver program. Ordinarily it is kept in the subdirectory bin within the rhmapper tree. Change this to any absolute or relative address of your choice.

RHMAPPORT and RHMAPHOST

These parameters are set when you want to run the server on a different host from the Perl clients, and specify the host name and the port number to connect to. Getting this set up requires fiddling with the /etc/services and inetd.conf files on the server host and will be documented when we get closer to a true release of this software.

ALPHA and BETA

These two parameters are estimates of laboratory error, and indicate the probability, per hybrid/marker pair, that the result is a false negative (ALPHA) or a false positive (BETA). The mapping results are relative insensitive to the actual values of these parameters, and you will get reasonable results with a range of values. I suggest setting these values to some value between 0.001 (0.1%) and 0.01 (1%). (It's also a good idea to get the real laboratory error rate down as far as possible by performing every experiment in duplicate!)

RETENTION_FREQUENCY, CALCULATE_RF_THRESHOLD, and ALWAYS_USE_UNIFORM_RF

These parameters affect the maximum likelihood calculation, which needs to have an a priori estimate of chance that a hybrid will retain a portion of the genome at random. RETENTION_FREQUENCY sets this value (approximately 40% for the Genebridge 4 panel). RHMAPPEr can calculate this value on the fly when the data is loaded, but it needs a certain number of markers before this estimate is meaningful; CALCULATE_RF_THRESHOLD specifies how many markers must be present before RHMAPPER will calculate this value.

RHMAPPER can be set to assume that retention frequencies are constant among all the hybrids (ALWAYS_USE_UNIFORM_RF set to 1), or can be instructed to calculate retention frequencies for each individual member of the hybrid panel. Unfortunately this slows down the calculation somewhat so we don't have extensive experience mapping under this assumption. I recommend using constant retention frequencies until we have more experience with the effects of the non-uniform setting.

MISSING_FREQUENCY

The maximum likelihood calculation needs an a priori estimate of how frequently data is missing from the data set. This number is the probability that a single hybrid/marker assay will be missing. The algorithm is relatively insensitive to the exact value of this parameter as long as it is nonzero (if you set the value to zero the server will ungracefully crash if there's any data's missing!)

CONVERGENCE_THRESHOLD

The maximum likelihood calculation uses an E-M (Expectation/Maximization) algorithm to converge on the correct solution. This value determines how close a convergence you require. Smaller values give more accurate results, but the convergence takes longer. A value of 0.01 seems to be adequate for map construction. You might want to reduce it to 0.001 or less to make the distance calculations as accurate as possible in the final maps.

TWO_POINT_CUTOFF

During initial data characterization and framework construction, it's useful to have RHMAPPER store lod scores for linkage between marker pairs. It's rarely useful to store non-significant lod scores, and TWO_POINT_CUTOFF determines the cutoff value (lod 3.0 by default).

FRAMETHRESH

When adding markers to a framework, the framework is tested with local permutations after each addition. If the testing code detects an alternative order within FRAMETHRESH of the best order, the marker is not added. This global is also used during framework construction to discard poorly-ordered marker triples.

PLACEMENT_LINKAGE

When placing new markers onto framework maps RHMAPPER will discard any marker that doesn't link to at least one framework marker with the lod score specified in this parameter (default 5.0).

PLACEMENT_CUTOFF

When placing markers onto framework maps, RHMAPPER reports all the alternative placements for a marker down to some log-likelihood difference level. This determines the cutoff. The default is 3.0, meaning that placements that are a lod of 3.0 or greater less likely than the most likely position will not be reported.

PLACEMENT_TOO_FAR

When placing markers onto framework maps, RHMAPPER will reject any marker which maps too far off one of the ends of the framework. Such markers usually contain errors. This variable determines how far is too far, and is set to 15 cR by default.

PLACEMENT_EXPANSION_TOLERANCE

Obsolete parameter

PLACEMENT_UNIQUE

Obsolete parameter

PLACEMENT_EXPANSION_TOLERANCE

Obsolete parameter

PICTTOPPM

If you have picttoppm installed, this parameter gives the full path to the program.

XV

If you have an image displayer installed, this parameter gives its path (we use XV, but "display", part of the ImageMagick package will work just as well. Others haven't been tried).

MAIL

This is the full path to the mailer program on your system. We use it to send out automatic reports during nightly mapping and to mail pictures.

BINHEX

Path to the BinHex program, for transporting Macintosh pictures.

DEBUG

Set this to 1 to enable additional diagnostic information.

You are free to add your own definitions to the configuration file. They will be imported into the shell as global variables that you can test and set like any other.

Using `RHMAPPER`

Introduction

Contents of the `rhmapper` Directory

rhmapper*

This is the rhmapper front end interpreter. You will generally do all your mapping within the environment provided by the rhmapper shell.

rhmapper.conf

The configuration file for rhmapper.

DATA/

Default location for data files.

util/

Library files used by rhmapper and other programs.

bin/

The rhmapserver executable and a number of external Perl scripts called by rhmapper to perform useful tasks.

One of these scripts, place_markers, can be used to map your data to the Whitehead frameworks without learning any of the details of RHMAPPER. See Using the place_markers Script for more details.

server/

Source code for rhmapserver.

testdata

A directory containing example data files to load and play with. A set of markers from chromosome 18 is there, as well as the complete set of mapped markers from the Whitehead's May 1996 data release:

WIMAPS.dat -- the complete data set
wiframeworks.dat -- just the markers used to create the framework maps

local_hacks/

A few (maybe quite a few) Perl scripts used at the Whitehead Institute that are probably not of much use to the rest of the world. Among other things, they generate reports comparing the radiation hybrid orders to orders derived from genetic and YAC content mapping.

Using the RHMAPPER Shell

To start up rhmapper, type its name at the command line:

   prego% rhmapper

After a short delay during while rhmapper starts up and initializes the server, you will be greeted with a command line prompt:

    1> print "Hello world!"
    Hello world!
    2>

You can now interact with the shell, typing commands and getting responses.

Editing Commands and the Command History

If you have installed the Perl Term::Readline library and are using an Xterm or a vt100-compatible terminal emulator, you can edit the line of text using the cursor keys to move the insertion point around. Many emacs-style keystrokes will work, including ^A to move to the beginning of the line, ^E to move to the end, ^K to kill everything from the insertion point to the end of the line, and ^Y to yank back the killed text.

The interpreter maintains a history list of all previous commands. You can move through the history list with the up and down cursor keys, edit previous commands and re-execute them by hitting return. In addition, you can re-execute commands by typing an exclamation point (!) followed by the number of the command to repeat. You can view the complete history of the session with the command history (this works even if Term::Readline is not installed).

You can type a portion of a command followed by the tab key and the interpreter will attempt to complete it for you. If there are several possible completions, the interpreter will display the alternatives. As of version 1.22, a limited form of context-dependent command completion is also performed. For example, if you type a command that ordinarily operates on groups and press the tab key, the interpreter will display all the current groups.

Commands and Perl Expressions

RHMAPPER is an enhanced Perl interpreter. It will accept, and evaluate, any valid Perl expression. In addition, certain commonly-used mapping functions are recognized as high-level commands that use a shell-like syntax, complete with output redirection and pipes.

For example, the "evaluate" command is used to evaluate a potential marker order and give the likelihood score and breakeage frequencies:

  4> evaluate A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07
  NAME                   BREAK FREQ      cR
  A152WG9                   0.035        3.6
  MR7497                    0.086        9.0
  GCT-D18S852               0.118       12.6
  MR10399                   0.084        8.8
  GCT5D07                   0.000        0.0

  LIKELIHOOD = -72.699698
  MAP LENGTH = 33.88

However, evaluate can also be used with perl syntax, in which case it returns the likelihood score and the breakeage frequencies as function results:

  5> ($like,@theta)=evaluate('MR7774','MR6986','A058YG5','AFM254YD5')

Note that single quotes around string constants are recommended for marker names. Otherwise names like WI-453 will confuse Perl into trying a subtraction.

Many frequently-used commands have shortcut abbreviations. For example, the "load_data" command can be abbreviated "ld".

Most of the built-in commands produce printed output. This can be undesirable when incorporating them into subroutines. Output can be suppressed by setting the global variable $QUIET to a nonzero value:

  5> $QUIET=1;
  6> ($like,@theta)=evaluate('MR7774','MR6986','A058YG5','AFM254YD5')
  7> $QUIET=0;

In order to allow you to enter Perl expressions that span more than a line (such as subroutines), the shell defers execution until the expression is "complete", meaning that the number of open and close brackets, braces and parenthesis are balance. Therefore this will work:

  5> sub count_ones {
  cont: my $vector = shift;
  cont: my $count = $vector=~tr/1/1/;
  cont: print "$count ones\n";
  cont: }
  6> count_ones 10101010110101010101
  9 ones

You can use a backslash to force continuation in circumstances in which the interpreter would screw with simple bracket counting:

  7> sub count_ones \
  8> { my $vector = shift;
  ....

Type // or q to back out of a continuation you wish to abort.

System Calls and Output Redirection

The RHMAPPER shell recognizes output redirection. You can redirect output to a file using ">" and to a pipe using "|". For example:

  8> evaluate MR7774 MR6986 A058YG5 AFM254YD5 > ./temporary.file
  9> evaluate MR7774 MR6986 A058YG5 AFM254YD5 | mail lstein

You can execute any UNIX shell command by issuing the "system" command:

  1> system df .
Filesystem          512-blks        used       avail capacity
gumbo:/usr/home/     1927324     1849584       58468    97%

Type "system" by itself to start up a UNIX shell.

The ">" and "|" redirection commands do not work while you're inside a Perl subroutine -- they only apply to commands typed at the top level of the RHMAPPER shell. However, a utility subroutine redirect() lets you achieve the same effect:

   sub remap {
       load_groups('framework');
       foreach $chr ('Chr1','Chr2','Chr3') {
          $CURRENT_GROUP=$chr;
          @markers = extract("CHROM=$chr");
          redirect("./maps/$chr.out");
          print "Starting placement map for $chr.\n";
          create_placement_map($chr,@h);
          show_map;
       }
   }

You can either provide a bare file name for redirect() or provide a pipe of commands.

Utility Commands in the RHMAPPER Shell

get, set

The get and set commands get and set global variables:

        10> get alpha
        ALPHA = 0.0004
        11> set alpha 0.002
        ALPHA = 0.002

All the variables defined in the configuration file are available as global variables and can be get and set in this way. You can also set these variables in Perl style:

        12> $ALPHA = 0.0004
        13> print $ALPHA
        0.0004

There are two important differences between using get/set and setting them programatically. One is that get/set will automatically capitalize the global. Perl is case sensitive, so $alpha and $ALPHA are different variables. The second is that a side effect of get/set is to propagate the changes into the environment. This allows you to spawn processes that use the rhmapper libraries and have them use the same settings. In practice, it turns out that you don't need to do this terribly often.

help

Help prints out a list of commands recognized by the RHMAPPER shell in a one-line-per-command form. You can get information about a single command with:

      help command_name

or a list of all 69 commands known to RHMAPPER by typing help without an argument. This listing will be too long for your screen. I suggest you pipe it through more like this:

      help | more

or search it for keywords of interest with the grep command:

      help | grep framework

print, p

This evaluates and prints an expression. Either "print" or "p" will work.

ls

Do a directory listing as per the UNIX "ls" command. Recognizes all the same options as UNIX "ls" (because it shells out to the real "ls").

cd

Change the current working directory

pwd

Print the current working directory.

history

Prints out a history of your session. You can reinvoke command number 3 by typing !3:

       4> history
       Command history:
       1: set project genome
       2: set alpha 0.01
       3: set beta 0.002
       4: history

system

Invoke a system command.

shell

Start up a UNIX shell.

quit, q

Quit RHMAPPER.

The RHMAPPER Database

RHMAPPER keeps data about markers in several Perl databases and flat files. The data is organized by project under the DATA directory. For example, if you have projects named "genome", "test", and "low_res", the directory structure underneath DATA will look like:

  DATA --+
         |--low_res-+
         |          |-Keywords
         |          |-Marker_info
         |          |-Marker_names
         |          |-Pairs
         |          |-groups-+
         |          |        |-framework
         |          |
         |          +-maps-+
         |                 |-Chr18.map
         |                 |-Chr19.map
         |
         |--test----+
         |          |-Keywords
         |          |-Marker_info
         |          |-Marker_names
         |          |-Pairs
         |          |-groups-+
         |                   |-framework
         |
         |--genome--+
                    |-Keywords
                    |-Marker_info
                    |-Marker_names
                    |-Pairs
                    |-groups-+
                    |        |-working
                    |        |-framework
                    |        |-frame4.23.5
                    |
                    +-maps-+
                           |-Chr1.map
                           |-Chr3.map
                           |-Chr4.map
                           |-Chr5.map

The data itself is kept in files called "Keywords", "Marker_info", "Marker_names", and "Pairs" (the names may be slightly different depending on which Perl DBM implementation you use). Ordered groups of markers, usually framework maps, are kept in a subdirectory called "groups". Placement maps are kept in a subdirectory called "maps". RHMAPPER routines manage these directories for you, so ordinarily you won't need to deal with them (although we frequently find it easiest to edit the framework files directly).

To select the current project, set the global variable $PROJECT. The best way to do this is with the command set:

   1> set project chr18
   PROJECT = chr18

To view the current project, use the get command:

   1> get project chr18
   PROJECT = chr18

Loading Data

To load data into RHMAPPER, format it into the following flat file format:

NAME	       CHROM	RHVECTOR
CHLC.GATA11A06 Chr18	01110111100000110000001100110010001010101000
MR10918        Chr18	00100101100100100110001001100010011011110110
AFMC014WF9     Chr18	00100101100101100000001101101110111001110010
MR10399        Chr18	01111111100100010110001100110110010010111210
GATA-D18S850   Chr18	00001100100000100121001000100012011011100000
MR3430         Chr18	01010111100100110020001100120010011020111001

There is line one per marker, one column per field, and columns are separated by tabs (spreadsheet style). The top line should contain a series of field names.

There should be one column labeled NAME, and another labeled RHVECTOR. In addition to these two, you can place any other name fields you wish. By convention, we use "CHROM" for chromosomal assignment, "GDIST" for genetic position, and "RANK" for the subjective quality of the marker. If no column labels are present, RHMAPPER will assume that the first column is the marker name and the second one is the RHVECTOR.

RHMAPPER doesn't care about the format of names (although it is convenient if they're alphabetical and don't contain whitespace). The data vector should be composed of the following numeric codes only (no whitespace allowed):

0 -- marker negative on this hybrid
1 -- marker positive on this hybrid
2 -- missing data

At the Whitehead we use a "2" to indicate "don't know". Every experiment is repeated at least twice, and we use 2's for the marker/hybrid assays that showed discrepant results among the trials.

To create a new project and initialize it using data from a flat file, use the load_data command (ld for short).

   16> set project chr18
   PROJECT = chr18
   17> ld testdata/Chr18.dat
   ./bin/table2boulder.pl testdata/Chr18.dat | ./bin/load_rh_info -CLn
   ./DATA/chr18 doesn't exist. Creating directory.
   Loaded: 66
   Bring the pairwise table up to date? [y]: y
   ./bin/new_markers | ./bin/calculate_pairs -x | ./bin/load_pairs -L
   1512 pairs loaded.
   Setting errors to 0.002,0.0002,0.40.
   Deleting list of new markers.
   Resetting cache...

load_data will load the contents of the flat file into the RHMAPPER database, replacing whatever was there before. After loading, you will be asked if you want to bring the pairwise table up to date. If you answer yes to this, the system will calculate the linkage scores between all pairs in the data set. This is useful for finding linkage groups and during framework construction, because it saves a lot of time in later calculations. It can take a significant length of time to run on large data sets (1000 markers or more) and might best be left to run overnight. After frameworks are constructed, pairwise calculations are no longer particularly useful.

To add data to an existing database, use the add_data (ad) command. Markers with the same names as ones already in the database will be updated. Again you will be asked if you want to update the pairs database.

You can avoid having to answer the question about updating the pairs database by placing a 1 (yes) or a 0 (no) on the command line. This is useful if you want to load data as part of a batch script:

   19> add_data testdata/Chr18.dat 1

Querying the Database

The database has a simple query mechanism. To get data on one or more markers by name, use the get_data (gd) command:

  27> get_data MR3396 MR3430 318XD5
  NAME	RHVECTOR	CHROM
  MR3396         00011010100000000110001000000010000...   Chr18
  MR3430         01010111100100110020001100120010011...   Chr18
  318XD5         00111011100000000100001001000010001...   Chr18

By default, this will print a table of all the fields the database knows about, in load format. To restrict the fields displayed to a subset, pass get_data a list of the fields you're interested in prefixed by a hyphen:

   28> gd -CHROM MR3396 MR3430 318XD5
   NAME           CHROM
   MR3396         Chr18	
   MR3430         Chr18	
   318XD5         Chr18

Typing get_data by itself (or with a field modifier) dumps out the database for all markers.

Simple database searches are possible using the extract (ed) command. You can search for regular expressions in marker names or in selected fields. For example, to search for all markers beginning with the prefix AFM:

   15> ed NAME=^AFM
   NAME           RHVECTOR                       CHROM
   AFMB311WH5     001001011001011001100011011... Chr18
   AFMC014WF9     001001011001011000000011011... Chr18

To search for all markers in the data set beginning with the prefix ^MR and whose RHVECTORS contain the pattern 00000000001:

   17> ed NAME=^MR RHVECTOR=00000000001
   NAME    RHVECTOR                              CHROM
   MR7458  0011001210000000000000100100001000... Chr18
   MR6366  0011101110000000011000100100001000... Chr18
   MR6213  0011001110000000000000100100001001... Chr18

This command is mostly useful for extracting portions of the data set by chromosome or rank. The Perl style function call will return a list of the matching markers:

   18> @h=&extract('NAME=^MR','RHVECTOR=00000000001')

You can access the database programatically using the Perl subroutine marker_info():

   18> print marker_info('MR11061','CHROM')
   Chr18

Basic Mapping Commands

Two-Point Linkage

Use the linked command to determine whether two markers are linked:

  31> linked MR11061 HSC0NA032
  LOD=14.971257, THETA=0.190

When this command is used with Perl syntax, it returns the lod score and the breakage frequency as function results:

  32> ($lod,$freq) = linked('MR11061','HSC0NA032')

Searching the Database for Linked Markers

To search the database for all markers that are linked to another above a certain lod threshold (default 3.0), use the find_linked command. It uses the precomputed pairs database, and so is very fast. This command searches for all markers linked to MR11061 above a threshold of 10.0:

  35> find_linked MR11061 10
             MR10918	LOD=21.2	THETA=0.09
              312WE9	LOD=15.5	THETA=0.19
           HSC0NA032	LOD=15.0	THETA=0.19
         GATA-P28072	LOD=15.1	THETA=0.20
              164YA9	LOD=12.9	THETA=0.24
   CHLC.GATA8E05.437	LOD=12.6	THETA=0.25
   CHLC.GATA4H06.130	LOD=12.7	THETA=0.26
           HSC02E112	LOD=12.2	THETA=0.27
         GATA-P19271	LOD=11.8	THETA=0.28
              MR5846	LOD=11.7	THETA=0.28

When called as a Perl function, this returns a list of linked markers in descending order of lod significance.

Finding Linkage Groups

To perform a transitive closure across the entire data set and find all linkage groups, use the link_groups command. This command also uses the pairs database and is therefore quite fast. The value of the TWO_POINT_CUTOFF global determines the lod score at which this command will declare two markers to be linked. You can also set the value on the command line:

  42> link_groups 10
  TWO_POINT_CUTOFF = 10
  Found 2 linkage groups at LOD=10.
  Group 1: size 47
  MR10918 AFMC014WF9 GATA-D18S850 GATA-P6094 318XD5 MR3275 HSC0NA032
  MR7850 MR6366 MR3396 MR10303 MR4847 MR3759 MR7458 MR7679 GATA-P19271
  CHLC.GATA10A09.966 UTR-03540 GATA-P6006 A081TC5 242YF2
  CHLC.GATA4H06.130 312WE9 MR6213 164YA9 A131XH9 GATA-D18S849 MR3726
  CHLC.GATA8E05.437 MR9461 MR9606 CHLC.GATA2E06.13 HSC03A012 MR5846
  MR11061 HSC02E112 GCT3G01 EST106071 GCT-P10825 GATA-P18613
  GATA-P28072 GATA-P19280 UTR-03210 MR11742 MR10536 CHLC.GATA3E08.39
  AFMB311WH5
  Group 2: size 19
  CHLC.GATA11A06.668 MR10467 NIB1802 MR10399 MR3430 GCT-D18S852
  B082ZF1 MR5261 A152WG9 A058YG5 GATA-P6694 MR6657 MR10800 MR7774
  MR7497 MR6986 MR10742 MR12029 GCT5D07
  UNLINKED: 0

This command has the side effect of defining several marker groups named LINK_1, LINK_2, and so on, that can be manipulated with the various group commands described later. When called as a Perl function, link_groups returns an array of linkage groups, each of which is an array of marker names.

Multipoint Map Evaluation

The basic map evaluation function is evaluate:

  54> evaluate A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07
  NAME                   BREAK FREQ      cR
  A152WG9                   0.020        2.0
  MR7497                    0.070        7.3
  GCT-D18S852               0.118       12.6
  MR10399                   0.084        8.8
  GCT5D07                   0.000        0.0

  LIKELIHOOD = -71.634071
  MAP LENGTH = 30.61

Given a marker order, evaluate performs EM maximization and prints out a table showing the estimated breakage frequencies and the calculated likelihood. This calculation is subject to the convergence threshold as well as alpha and beta error rates:

  56> set CONVERGENCE_THRESHOLD 0.0001
  CONVERGENCE_THRESHOLD = 0.0001
  57> evaluate A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07
  NAME                   BREAK FREQ      cR
  A152WG9                   0.017        1.7
  MR7497                    0.068        7.0
  GCT-D18S852               0.115       12.2
  MR10399                   0.082        8.6
  GCT5D07                   0.000        0.0

  LIKELIHOOD = -71.624104
  MAP LENGTH = 29.53

When called in a Perl expression, evaluate returns the likelihood of the order, and an array of breakage frequencies:

  74> $QUIET=1;
  75> ($likelihood,@theta)=evaluate(A152WG9,MR7497,'GCT-D18S852',
  cont: MR10399,GCT5D07)
  76> print $likelihood
  -71.501194
  77> print "@theta"
  0.019 0.067 0.112 0.081

Creating and Manipulating Marker Groups

Groups are a useful RHMAPPER data type. A group is a named, ordered list of markers. Groups can be saved to a named file and reloaded, and most of the mapping functions accept groups as arguments. Some commands, such as find_link_groups generate groups automatically, An important special case of a group is the framework map.

In the current implementation of groups a given marker can only belong to one group at a time. This restriction may be relaxed in future versions.

Assigning a Set of Markers to a Named Group

The define_group (defg) command accepts the name of a group and a list of markers to assign to it:

  30> defg close_markers A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07

This places the named markers on a list named "good_markers". They will be removed from any groups they were already on. Two ways to do the same thing using Perl function calls are shown here:

  31> &define_group('good_markers','A152WG9','MR7497', \
  cont: 'GCT-D18S852','MR10399','GCT5D07')
  32> @h = qw/A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07/
  33> &define_group('good_markers',@h)

Once a group is defined, it can be used in any command that accepts a list of markers:

  38> evaluate close_markers
  CURRENT_GROUP = close_markers
  NAME                   BREAK FREQ      cR
  A152WG9                   0.020        2.0
  MR7497                    0.070        7.3
  GCT-D18S852               0.118       12.6
  MR10399                   0.084        8.8
  GCT5D07                   0.000        0.0

  LIKELIHOOD = -71.634071
  MAP LENGTH = 30.61

The global variable CURRENT_GROUP contains the group you are currently working with. If you don't provide a group name for commands that manipulate groups, they will assume the group named in this global. This example will have the same result as the previous one:

   39> set current_group close_markers
   CURRENT_GROUP = close_markers
   40> evaluate

For convenience, the set_group (setg) command is a shortcut for set current_group:

  41> set_group close_markers
  CURRENT_GROUP = close_markers

Group Manipulation Functions

There are several utility routines that allow you to perform various operations on the contents of groups:

intersect_groups group1 group2 destination: Find the intersection between group1 and group2 and store it in the destination group. This command can be abbreviated intg.
union_groups group1 group2 destination: Find the union of the two groups and store it in the destination. This command can be abbreviated uniong.
subtract_groups group1 group2 destination: Subtract group2 from group1 and store the result in destination. This can be abbreviated subg.
delete_group group1 group2 group3...: Completely delete the named groups.

View all Currently Defined Groups

The list_groups (listg) command will list all the currently defined groups:

  46> list_groups
  close_markers (5): A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07
  bad_markers (3): 318XD5 MR3396 MR3430

Saving Groups to a File

The save_groups (saveg) command saves the currently defined groups to a named file:

  49> save_groups working

This will save the list of groups into a file named working in the subdirectory ./DATA/chr18/groups/ (assuming that the current project is named "chr18"). You can get a listing of files containing saved groups with the list_saved_groups command (lsg):

  52> lsg
  total 1
  -rw-rw-r--   1 lstein   genome        90 Nov 14 10:44 working

Groups can be restored from a file using with the load_groups (loadg) command:

  51> loadg working
  2 groups loaded.

The format of the saved group files is dirt simple. There's one line per group. Each line starts with the group name, followed by a list of the markers in the group. You are free to edit the saved group file and then reload the groups. For example, here's what the working group file will look like after executing the commands above:

  bad_markers 318XD5 MR3396 MR3430
  close_markers A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07

If you give load_groups and save_groups a file path instead of a bare file name, the routines will load from and save to the desginated file rather than to the groups directory:

  52> loadg /home/gumbo/lstein/imported_frameworks

Manipulating Groups in Perl

The group() function takes the name of a group as an argument and returns a Perl list containing the marker names.

  58> @h=group(close_markers)
  59> p "@h"
  A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07

Finding the Order of Markers Within Groups

You will usually be concerned with finding the correct order of the markers in a group. As described above, the evaluate command will evaluate the likelihood of the order of markers in the current group. Building framework maps is discussed in a separate section, but there are several frequently-used commands that are useful when handling small groups or investigating problems in localized regions of larger maps.

permute

The permute command exhaustively checks all permutations of the given order, and prints out a summary giving the best and next best orders:

  65> permute close_markers
  A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07	-71.634071
  A152WG9 MR7497 GCT-D18S852 GCT5D07 MR10399	-64.999899
                 .
                 .
                 .
  MR10399 GCT-D18S852 MR7497 A152WG9 GCT5D07	-68.528327

  BEST ORDER: GCT-D18S852 GCT5D07 MR7497 A152WG9 MR10399
  NEXT BEST : A152WG9 MR7497 GCT5D07 GCT-D18S852 MR10399
  LOD vs nextbest = 1.934764

You can call it with a named group as shown here, or with a list of marker names. When called as a Perl function, permute will return the lod for the best order vs the next best order, followed by the best order:

  66> ($lod,@order) = permute('close_markers')

Because there are N!/2 permutations of a list of N markers, permute will take a very long time for lists greater than 7 in length.

ripple

The ripple command is useful for verifying the quality of candidate orders. It works by sliding a window across an order and finding all permutations or markers within that window, remembering the best order within that window. At the end of the process, it reports the best order and the next best order found. The arguments for this command are the window size (a size of 5 is the practical maximum), and a group name or list of marker names:

   51> ripple 3 close_markers
   CURRENT_GROUP = close_markers
   A152WG9 GCT-D18S852 MR7497 MR10399 GCT5D07	-77.081065
   MR7497 A152WG9 GCT-D18S852 MR10399 GCT5D07	-72.720026
     .
     .
     .
   GRAND BEST ORDER: A152WG9 MR7497 MR10399 GCT-D18S852 GCT5D07
   NEXT BEST ORDER: A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07
   LOD vs nextbest = 1.052755

We can deduce from this display that we should not be very confident of the order of markers in close_markers, as local permutations alone found another order that is only 10-fold less likely. In contrast, when this command is run on the best order found by exhaustive permutation search, the next best order found has a lod of 2.24, more than 100-fold less likely.

ripple_sort

ripple_sort is similar to ripple, but it modifies the order as it goes along. The algorithm finds all permutations of markers within the window. The best permutation is then inserted back into the order and the window position is advanced. At the end, the best order and the lod change from the previous order are reported.

One strategy for fine-tuning orders might be to repeat the ripple sort until the algorithm fails to find local improvements in the order. However, we have not used this technique to create our maps and don't have experience with it.

The function results for ripple_sort are the lod difference between the starting order and the final order, and the final order itself:

   61> ($change,@new_order) = ripple_sort(3,'close_markers')

Filling Gaps in Orders

It's often necessary to fill gaps in orders or to extend a partial order. The commands span and extend are useful tools for these tasks. span takes two markers as its arguments and tries to find a marker in the database that fits between them. The search looks for markers that are linked to both flanking markers, and are more likely to belong in the center than at either end by a factor determined by the TWO_POINT_CUTOFF global (usually set at 3.0):

  65> span MR6657 A058YG5 
  Found 53 markers linked to MR6657 or A058YG5
  A058YG5 MR6657 MR3275 : SIGNIFICANT BUT FLIPS
  A058YG5 MR6657 MR7458 : SIGNIFICANT BUT FLIPS
  A058YG5 MR6657 CHLC.GATA4H06.130 : SIGNIFICANT BUT FLIPS
  A058YG5 MR6657 MR6213 : SIGNIFICANT BUT FLIPS
  A058YG5 MR6657 HSC03A012 : SIGNIFICANT BUT FLIPS
  A058YG5 MR6657 GATA-P28072 : SIGNIFICANT BUT FLIPS
  The following orders span the markers given:
  MR6657 GATA-P6694 A058YG5 (4.76667400000001)
  MR6657 NIB1802 A058YG5 (3.99561999999999)
  MR6657 MR7774 A058YG5 (3.876677)

In this example, three markers were found that fit between MR6657 and A058YG5. Six more markers were found that have unique orders in the triple, but in each of these cases the best position was at the end of the order.

Use extend to extend the order of a partial order. This search will find markers that are linked to the right end of the order. It then determines all permutations of the partial order and reports success if it the chosen marker maps to the right end of the partial order:

  5> extend A058YG5 MR6657 MR7458
  Found 58 markers linked to MR7458
  A058YG5 MR10742 MR6657 MR7458 : SIGNIFICANT BUT FLIPS
  A058YG5 GATA-P6694 MR6657 MR7458 : SIGNIFICANT BUT FLIPS
  A058YG5 MR7774 MR6657 MR7458 : SIGNIFICANT BUT FLIPS
  A058YG5 NIB1802 MR6657 MR7458 : SIGNIFICANT BUT FLIPS
  The following orders extend the partial order:
  A058YG5 MR6657 MR7458 CHLC.GATA8E05.437 (4.907392)
  A058YG5 MR6657 MR7458 CHLC.GATA10A09.966 (4.70801200000001)
  A058YG5 MR6657 MR7458 HSC0NA032 (4.52770799999999)
  A058YG5 MR6657 MR7458 CHLC.GATA3E08.39 (4.49122799999999)
  A058YG5 MR6657 MR7458 164YA9 (4.47804500000001)
      .
      .
      .

The Perl function result of both these commands is a list of candidate markers.

Miscellaneous Group Commands

print_group

The command print_group will print out a table of the radiation hybrid vectors

  62> print_group close_markers
  CURRENT_GROUP = close_markers
  A152WG9      2.02  F  0110111110010011011000110...
  MR7497       7.26  F  0110111110010011011000110...
  GCT-D18S852 12.56  F  0110111110000001011000110...
  MR10399      8.77  F  0111111110010001011000110...
  GCT5D07      0.00  F  0110111110010001011000110...

display_group

display_group evaluates the group given on the command line (CURRENT_GROUP by default)p, then opens up a pipe to your selected display program (if any), and displays the map graphically in a pop-up window.

mail_group

mail_group evaluates the group given on the command line (CURRENT_GROUP by default), then opens up a pipe to the mail program on your system and mails a Macintosh PICT version of the map to the login user:

  63> mail_group Chr18.framework

Building Framework Maps

We define a framework map as an ordered group of markers for which any local permutation (as defined by the ripple test), will reduce the likelihood of the map by a predefined threshold. The threshold we use at the Whitehead is 2.5, but 3.0 frameworks are more common.

There are several approaches to building frameworks. One is to use external information, such as known genetic order, to nucleate the framework. Once the framework is started, it is easy to grow it by incrementaly adding in markers. See growing an existing framework.

If external ordering information isn't available to get the framework started, you can assemble the framework from scratch by finding a set of well-ordered triples which can be assembled into a well-ordered framework.

Finding Well-Ordered Triples

Ignoring trivial flips, there are three possible orders for a group of three markers A, B and C:

    A-B-C
    B-A-C
    A-C-B

The strategy is to find all the triples in the data set for which one of these three orders is more likely than the others by a high lod score. The triples can then be assembled into a framework as illustrated here:

Triple 1:   A-B-C
Triple 2:     B-C-D
Triple 3:       C-D-E
Triple 4:         D-E-F

Deduced Order: A-B-C-D-E-F

Because framework building is sensitive to a number of parameters, our strategy is to find all the well-ordered triples in a data set, save them to a file, and then apply various combinations of frame building algorithms and parameters to the file.

The command find_triples is used to search the data set for well-ordered triples. The command's format is:

    find_triples

You provide the name of a file (absolute or relative) to save the triples to and a list of markers to examine. This can be a named group, such as a group created by the find_link_groups command or a list of markers typed on the command line. The equivalent Perl function accepts a Perl list for this argument. For example, here's how to find all the well-ordered triples in the entire data set and dump them out to a file named "triples":

   25> set project chr14
   26> loadg working
   27> find_triples ./triples LINK_1
   processing pairs ...
   After 5000 triples, 
   The best triple in database  is:
   CHLC.GATA11A06.668 GCT-D18S852 MR5261	-79.322560
   CHLC.GATA11A06.668 MR5261 GCT-D18S852	-78.088223
   MR5261 CHLC.GATA11A06.668 GCT-D18S852	-71.741460
   with a lod score of 6.346763
        .
        .
        .
   
   Finally:  
   The best triple in database  is:
   MR3396 MR7679 MR3726	-79.840608
   MR3396 MR3726 MR7679	-78.672711
   MR3726 MR3396 MR7679	-70.772235
   with a lod score of 7.90047600000001

find_triples is sensitive to the value of the global parameter FRAMETHRESH (we usually use 3.0). Triples for which the next best order is within FRAMETHRESH of the best order are discarded.

This command takes a while to run: a set of 100 marekrs can take several hours on a DEC alpha, and the running time increases dramatically with more markers. You can reduce running time and increase the ultimate map quality by preselecting your best markers to use for framework building. A good idea would be to select markers containing no missing data.

The format of the file written by find_triples should be self-explanatory.

Building Frameworks from Triple Files

Once the list of triples is saved to a file, you can use the assemble_framework1 (af1) or assemble_framework2 (af2) commands to assemble the triples into a framework.

The two commands have the same arguments and produce the same output, but use slightly different algorithms. The assemble_framework1 seeks to find the longest path involving overlapping triples such that the last two markers of the left triple overlaps with the first two markers of the right triple, for all adjacent triples in the path. During construction, triples which are ordered at a lod of less than TRIPLE_LOD, or which contain gaps in excess of TRIPLE_DIST centirays are discarded. This criterion is very stringent and the paths found using this algorithm tend to be short but reliable.

assemble_framework2 takes a more aggressive approach by using all the ordering information in the triples to assemble a path. For example the following pair of triples will be assembled by assemble_framework2:

      A-B-C
      A - C-D

Will be assembled into:
      A-B-C-D

assemble_framework2 tends to find longer paths. However the paths may not be as reliable (this algorithm has not be extensively tested yet). This command also respects the values of TRIPLE_LOD and TRIPLE_DIST.

Both these commands are invoked in the same way:

  5> assemble_framework1 triples
  Loading triples...
  
  There are 298 triples ordered at lod 4.0 and with no gaps larger than 30 cR.
  Now searching for frameworks...

  Found a path containing 6 markers, consisting of the triples: 
  EST106071 MR3396 GATA-P6094 6.839365
  MR3396 EST106071 312WE9 4.889032
  164YA9 312WE9 EST106071 5.579925
  312WE9 164YA9 MR10303 4.21074499999999

  Path: GATA-P6094 MR3396 EST106071 312WE9 164YA9 MR10303

  Found a path containing 6 markers, consisting of the triples: 
  MR10918 CHLC.GATA4H06.130 MR3275 4.607503
      .
      .
      .

Overall longest path contains 12 markers.  
Longest Path:  MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 MR7679 \
        MR3275 A131XH9 MR5846 CHLC.GATA8E05.437 A081TC5 GATA-P19280

The command prints out all paths that it finds: this is often copious. At the end the single path found will be printed. When called with Perl syntax, the subroutine will return the longest path found. This can then be assigned to a named group and used as the basis for growing a longer framework.

I suggest that you run the ripple test on any framework that these algorithms find. You should also evalute the map to make sure that the framework consists of evenly spaced markers:

  7> ripple 3 MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 MR7679 \
  cont: MR3275 A131XH9 MR5846 CHLC.GATA8E05.437 A081TC5 GATA-P19280
  LOD vs nextbest = 4.30339899999998
  8> evaluate MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 \
  cont: MR7679 MR3275 A131XH9 MR5846 CHLC.GATA8E05.437 A081TC5 GATA-P19280
  NAME                   BREAK FREQ      cR
  MR11742                   0.108       11.4
  GATA-D18S850              0.206       23.1
  GCT3G01                   0.180       19.9
  UTR-03540                 0.210       23.6
  MR3396                    0.239       27.3
  MR7679                    0.164       17.9
  MR3275                    0.164       17.9
  A131XH9                   0.134       14.4
  MR5846                    0.156       17.0
  CHLC.GATA8E05.437         0.181       20.0
  A081TC5                   0.149       16.1
  GATA-P19280               0.000        0.0

  LIKELIHOOD = -219.031954
  MAP LENGTH = 208.5
  9> define_group chr18 MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 \
  cont: MR7679 MR3275 A131XH9 MR5846 CHLC.GATA8E05.437 A081TC5 GATA-P19280
  10> save_group working

Growing Frameworks

Once you have a good framework it is straightforward to add markers to it. The command grow_framework takes a set of markers and adds them to the framework. The algorithm it uses is as follows:

For each marker provided

Try every possible position in the framework and find the best position.
Reject the marker if there is no unique position with lod score greater or equal to a lod cutoff.
Ripple the resulting order using windows of 2 and 3. Reject the new order if the ripple finds an alternative order with lod score below the FRAMETHRESH cutoff.
If the new order survives the ripple test, keep it and proceed to the next marker.

The calling parameters for grow_framework are:

   grow_framework

The first parameter is the name of the framework group to try to grow. This should be followed either by a named group or by a list of markers.

If any of the markers provided are already on the framework, you will see a warning message and the marker will be skipped. This command is most useful as a Perl function call where it will return the new framework as the function result.

For example, here's how to add all markers in LINK_1 group to an existing framework group named "chr18" at lod 3.0:

   7> set framethresh 3.0
   FRAMETHRESH = 3.0
   8> loadg working
   2 groups loaded.
   9> grow_framework chr18 LINK_1
   ** 318XD5 **
      Stepwise...
      Rippling pairs...
      Rippling triples...
      SUCCEEDED: MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 \
        318XD5 MR7679 MR3275 A131XH9 MR5846 CHLC.GATA8E05.437 A081TC5 \
        GATA-P19280
      LOD: 3.44092600000002
   ** GATA-P6094 **
      Stepwise...
              ...failed with lod 0.278309999999976
   ** HSC0NA032 **
      Stepwise...
      Rippling pairs...
      Rippling triples...
      SUCCEEDED: MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 318XD5 \
        MR7679 MR3275 A131XH9 MR5846 HSC0NA032 CHLC.GATA8E05.437 A081TC5 \
        GATA-P19280
      LOD: 3.443511
         .
         .
         .
   TRIED: 54
   ADDED: 7
   FINAL ORDER: MR11742 MR9606 GATA-D18S850 MR4847 GCT3G01 UTR-03540 \
     MR3396 MR6366 318XD5 MR7679 GATA-P18613 MR6213 MR3275 A131XH9 MR5846 \
     HSC0NA032 CHLC.GATA8E05.437 A081TC5 GATA-P19280
   Updating chr18...
   11> save_groups working

If you've been following the examples so far, the framework map obtained in the previous step contains the Q arm of Chr18. In order to finish the map, it would be necessary to work separately on the other linkage group and then knit the two together using external information.

Building Placement Maps

In any mapping project a certain number of markers cannot be placed in the framework because they are too close to adjacent markers. These markers must be binned in a "placement map". Placement maps consist of a framework map plus a set of markers that are placed relative to the framework. For each placed marker there is a most likely interval in which the marker belongs, and (often) several other candidate intervals which are less likely.

Creating a placement map is straightforward. The easiest way to do it is with the create_placement_map command. This command accepts the name of the new placement map and a list of markers to place on it. It gets the name of the framework to use to start the map from the CURRENT_GROUP global variable. Here's how to create a placement map named "Chromosome18" from a framework map named "chr18":

  13> load_groups working
  14> set_group chr18
  15> create_placement_map Chromosome18 LINK_1

This command will create a placement map named "Chromosome18" and store it in the global variable CURRENT_MAP. As the command runs, it attempts to place all markers in group 'LINK_1' onto the map; as it runs it displays the alternative placements for each marker. Markers that map well are added to the placement.

In addition to specifying the markers to map in a named group, you can present a list:

    create_placement_map Chromosome18 HSC0NA032 A081TC5 GATA-P19280
                        -or-
    create_placement_map('Chromosome18',group(LINK_1))

create_placement_map is sensitive to the value of PLACEMENT_TOO_FAR. If a marker maps off the end of the framework, it will be discarded if the distance exceeds the value of this global (15 cR by default). This catches and discards markers containing many errors.

Once created, the current placement map can be printed in tabular form to the screen with print_map, viewed graphically in a pop up window with display_map, or converted to Macintosh PICT format and mailed to the current user with mail_map.

Here is an example of a graphical map created by mail_map and display_map:

In displays produced by this program, boldfaced names are framework markers, others are placed markers.

Saving and Restoring Placement Maps from Files

Placement maps can be saved to a file using save_map (savem), and restored from a file using load_map (loadm). Each of these commands takes a file name as the argument:

  17> save_map chr18.map
  18> load_map /home/lstein/old/chr18.map

If a bare filename is used (no slashes), the map is stored in a subdirectory of the project called "maps". Otherwise, the provided pathname is used. In the first example above, the map is saved to the file

$DATA/chr18/maps/chr18.map

(where $DATA is the database directory). In the second example, another map is restored from the file /home/lstein/old/chr18.map

list_saved_maps (listm) will list all maps saved in the "maps" directory.

Adding to an Existing Placement Map

The add_to_map command can be used to add a marker or set of markers to an existing placement map:

  19> load_map chr18.map
  20> add_to_map MR10399 318XD5 MR3396
  21> save_map chr18.map

Error Detection

Given a marker order, find_errors flags marker/hybrid assay results that are likely to be the result of laboratory error. The first argument to this command is the desired lod of error threshold, followed by a list of markers or a group name:

  4> find_errors 3 GCT-D18S852 GCT5D07 MR7497 A152WG9 MR10399
  GCT-D18S852         011011111000000101100110110010000111100101011000011...
  GCT5D07             011011111001000101100110110010010111100101011000011...
  MR7497              011011111001001101100110110010010111000101001000011...
                                                                 ^
  A152WG9             011011111001001101100110110010010111010101011000011...
  MR10399             011111111001000101100110110010010111210101111000011...

In this example, the caret marks a single hybrid in marker MR7497 in which there is a 1-0-1 transition. Because a threshold of 3 was specified in the command RHMAPPER is reporting that this result is 1000:1 more likely to represent a laboratory error than a true result. (The results for MR7497 were deliberately altered in order to create this example -- it won't work in the test data set provided with RHMAPPER). In order for these results to be meaningful, the order must be essentially correct.

Called in Perl function call style, find_errors() returns an associative array in which the keys are the names of the flagged markers. The values are associative arrays in which the keys are the hybrid numbers and the values are the lod of error scores:

  7> $QUIET=1; %errors = find_errors(3,'Chr18.framework');
  8> foreach $marker (keys %errors) {
  cont:   foreach $hybrid (keys %{$errors{$marker}}) {
  cont:      print "ERROR: $marker on $hybrid, LOD $errors{$marker}->{$hybrid}\n";
  cont:   }
  cont: }

Using the place_markers Script

place_markers, a Perl script that you will find in the bin/ subdirectory, is a quick way of placing your markers relative to the Whitehead framework maps. Given a table of marker names and their RH vectors (formatted in Genebridge 4 order, see Genebridge Panel), this program will determine and print out the placement position of each marker along with graphical maps showing its position.

To use it, you must have RHMAPPER installed and running. You'll first need to create and load a project named "whitehead", using the data provided in the testdata/ subdirectory:

zorro: rhmapper
1> set project whitehead
PROJECT = whitehead
2> ld testdata/wiframeworks.dat
./bin/table2boulder.pl testdata/wiframeworks.dat | ./bin/load_rh_info -CLn
Assuming first two columns columns are marker name and vector
Loaded: 1630
Bring the pairwise table up to date? [y]: n
3> quit

In this example only the subset of the Whitehead markers that correspond to the framework maps was loaded. The rest are not needed for place_markers to run. You do not need to bring the pairwise comparisons table up to date either, although you can do so (this will take several hours to overnight).

To run place_markers, create a file that looks like this:

HK45-L 0111010112001110101101101001010101111100...
MBQH2  0110002000000001111101111001001120001010...
Glue2  1111001110100101001101010011101120011010...

The first column should contain the names of the marker's you're mapping. The second column should contain the RH vectors in Genebridge 4 order. You may introduce spaces in the vector to increase readability.

Now run the file through place_markers in this way:

zorro% bin/place_markers <datafile
NAME          Chromosomal Assignment and Placement(s)
----          ---------------------------------------
HK45-L        Chromosome Chr9
              Places 70.52 cR from WI-3028 (lod >3.0)

MBQH2         Chromosome Chr4
              Places 0.00 cR from WI-4262 (lod >3.0)

Glue2         ** No linkages at lod 15 **

PICTURES, if requested, are attached to the bottom of this message.

Submitted markers in the context of Whitehead framework map:

Placement Data Relative to Framework for Chromosome Chr4
Name        Dist   Type   Vector
----        ----   ----   ------
WI-6657        8.2 F      010100010100000001001011011011010100...
WI-5430        4.7 F      000100010100100001001011011010011100...
D4S2925        5.4 F      000000010100100001000011010010011100...
CHLC.GATA42   11.6 F      010000010100100001000011010010011100...

place_markers has many command line switches to control its behavior. Among the more useful are:

-p project: Set the project to use ("whitehead" by default)
-f group: Framework to use ("framework" by default)
-l number: lod threshold at which to call a chromosomal linkage (15.0 by default)
-g: Create a picture of the mapped markers in uuencoded GIF format.
-q: Create a picture of the mapped markers in BinHex'd PICT format.

Use the -h switch to get a summary of other features and settings.

License Agreement

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Redistributions of source code must also reproduce this information in the source code itself.
If the program is modified, redistributions must include a notice (in the same places as above) indicating that the redistributed program is not identical to the version distributed by Whitehead Institute.
All advertising materials mentioning features or use of this software must display the following acknowledgement:
This product includes software developed by the Whitehead Institute for Biomedical Research.
The name of the Whitehead Institute may not be used to endorse or promote products derived from this software without specific prior written permission.

We request that users of this software inform us by sending email to software_registration@genome.wi.mit.edu.

We also request that use of this software be cited in publications as:

L. Stein, L. Kruglyak, D. Slonim and E. Lander. (1995) "RHMAPPER", unpublished software, Whitehead Institute/MIT Center for Genome Research. Available at http://www.genome.wi.mit.edu/ftp/pub/software/rhmapper/, and via anonymous ftp to ftp.genome.wi.mit.edu, directory /pub/software/rhmapper.

THIS SOFTWARE IS PROVIDED BY THE WHITEHEAD INSTITUTE ``AS IS'' AND  ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE  IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE  ARE
DISCLAIMED. IN NO EVENT SHALL THE WHITEHEAD INSTITUTE BE LIABLE  FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL  DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS  OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)  HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

References

D. Cox, M. Burmeister, et al. (1990). "Radiation Hybrid Mapping: A Somatic Cell Genetic Method for Constructing High-Resolution Maps of Mammalian Chromosomes. Science 250: 245-250.
M. James, C. Richard III, et al. (1994). "A radiation hybrid map of 506 STS markers spanning human chromosome 11". Nature Genetics 8: 70-75.
T. Hudson, L. Stein, et al. (1995). "An STS-Based Map of the Human Genome", Science (in press).
M. Boehnke, K. Lange, and D. Cox (1991). "Statistical methods for multipoint radiation hybrid mapping. Am. J. Hum. Genet. 49: 1174-1188.
D. Slonim, L. Stein, L. Kruglyak and E. Lander (1995). "RHMAPPER: An Interactive Computer Package for Constructing Radiation Hybrid Maps". In preparation.

Appendix: The Genebridge4 Hybrid Panel Order

Names of Genebridge Cell Lines used for RH mapping
Index	Cell Line	Index	Cell Line	Index	Cell Line	Index	Cell Line
1	4A4	21	4E2	41	4J9	61	4P11	81	4V2
2	4A5	22	4E4	42	4K5	62	4Q2	82	4V3
3	4AA5	23	4E6	43	4K7	63	4Q4	83	4V7
4	4AA7	24	4E11	44	4K8	64	4R1	84	4V8
5	4B2	25	4F6	45	4K9	65	4R2	85	4W1
6	4B3	26	4F7	46	4K12	66	4R3	86	4Y4
7	4B9	27	4F13	47	4L3	67	4R5	87	4Y8
8	4BB1	28	4G1	48	4L4	68	4R6	88	4Y9
9	4BB6	29	4G5	49	4L6	69	4R10	89	4Z5
10	4BB8	30	4G6	50	4M4	70	4R12	90	4Z6
11	4BB10	31	4G7	51	4M5	71	4S3	91	4Z9
12	4BB12	32	4G11	52	4N3	72	4S6	92	4Z11
13	4C3	33	4H1	53	4N5	73	4S10	93	4Z12
14	4C11	34	4H8	54	4N6	74	4S12
15	4CC8	35	4H9	55	4N7	75	4T3
16	4D1	36	4H12	56	4N12	76	4T4
17	4D7	37	4I1	57	4O5	77	4T10
18	4DD2	38	4I4	58	4O10	78	4T11
19	4DD5	39	4J2	59	4P2	79	4U1
20	4DD8	40	4J5	60	4P9	80	4U3

Lincoln D. Stein, lstein@genome.wi.mit.edu
Whitehead Institute/MIT Center for Genome Research

Last modified: Fri Jan 9 17:11:06 EST 1998