rhmapserver
. The compiler that
comes with SunOS is non-standard and is known to have problems
compiling rhmapserver
. The server compiles well with the
freely-available gcc compiler, as well as with almost all other C
compilers. You'll also need a C compiler to compile the BinHex
utility described below.
Some architectures require you to install the Ioctl module in order for Term::Readline to work. That's included too.
ftp://ftp.cs.berkeley.edu/ucb/4bsd/db.tar.Z
BinHex was written at the Whitehead, and is included in this package.
ftp://ftp.x.org/R5contrib/netpbm-1mar1994.tar.gz
There is a bug in the distribution of picttoppm
that prevents the -fontdir
switch from working
properly (this is used to direct picttoppm to display certain
X11 fonts for Macintosh fonts). A patch
to picttoppm.c
that fixes this bug is included in
this package.
picttoppm
and one of these displayers,
RHMAPPER can pop up a window to display a map.
Locations:
rhmapper
directory and edit the
top level Makefile to suit your preferences. The following settings are defined:
CC
OPTS
PERL
INSTALLDIR
INSTALL
If you wish to install BinHex, you should now enter the BinHex directory and examine the Makefile, making the appropriate changes. You'll have to adjust one define to suit the endianness of your machine, as explained in the Makefile.
/usr/local/bin/perl5
. If Perl is located somewhere
else, you'll have to change the path name on the top line of
rhmapper
and all scripts ending in the suffix
.pl
. Most sites use /usr/local/bin/perl instead of
/usr/local/bin/perl5. It may be easiest just to create a link
from perl5 to perl.
rhmapper.conf.in
to the global configuration file
rhmapper.conf
. You should review the settings in
rhmapper.conf
and adjust them appropriately.
The settings are all PARAMETER=VALUE pairs. Whitespace before or after
the = sign is ignored. Use single and double-quotes if you need to
include whitespace within the value (shouldn't be necessary). The
parameters are used to establish defaults when rhmapper
starts up. Their values are accessible from within the interpreter,
and they can be examined and set while the program is running.
Certain parameter values have special meanings:
Two new features in version 1.22 make it easier to deal with rhmapper's various settings. First, each user of the software may have a personal settings file, ordinarily stored in his home directory in a file named .rhmapper. Any or all parameters defined in the global settings file can be overriden in this personal file. This allows users to maintain their own private project databases, or for groups of users to share the same project database.DATAFILES = %r/%u-DATA
Secondly, project-specific settings, including the error rate, retention frequency and parameters that adjust the way the mapping software works, may be saved to the project database automatically when the user leaves the project, and restored when the user reopens the project. The settings are user-specific so that one user's preferences do not interfere with another's.
Both of these new options can be disabled if need be.
USE_READLINE
DATAFILES
rhmapper
tree.
Ordinarily this directory is called DATA. Set this parameter
to any absolute or relative path in order to change the default
location.
USER_CONF
PERSISTENT_PARAMETERS
RHMAPSERVER
rhmapserver
program.
Ordinarily it is kept in the subdirectory bin
within the rhmapper
tree. Change this to any
absolute or relative address of your choice.
RHMAPPORT
and RHMAPHOST
/etc/services
and
inetd.conf
files on the server host and will be
documented when we get closer to a true release of this
software.
ALPHA
and BETA
RETENTION_FREQUENCY
,
CALCULATE_RF_THRESHOLD
, and
ALWAYS_USE_UNIFORM_RF
RHMAPPER can be set to assume that retention frequencies are constant among all the hybrids (ALWAYS_USE_UNIFORM_RF set to 1), or can be instructed to calculate retention frequencies for each individual member of the hybrid panel. Unfortunately this slows down the calculation somewhat so we don't have extensive experience mapping under this assumption. I recommend using constant retention frequencies until we have more experience with the effects of the non-uniform setting.
MISSING_FREQUENCY
CONVERGENCE_THRESHOLD
TWO_POINT_CUTOFF
RHMAPPER
store
lod scores for linkage between marker pairs. It's rarely
useful to store non-significant lod scores, and
TWO_POINT_CUTOFF determines the cutoff value (lod 3.0 by
default).
FRAMETHRESH
PLACEMENT_LINKAGE
RHMAPPER
will discard any marker that doesn't link
to at least one framework marker with the lod score specified
in this parameter (default 5.0).
PLACEMENT_CUTOFF
RHMAPPER
reports all the alternative placements for a marker down to
some log-likelihood difference level. This determines the
cutoff. The default is 3.0, meaning that placements that are a
lod of 3.0 or greater less likely than the
most likely position will not be reported.
PLACEMENT_TOO_FAR
RHMAPPER
will reject any marker which maps too far off one of the ends of
the framework. Such markers usually contain errors. This
variable determines how far is too far, and is set to 15 cR by
default.
PLACEMENT_EXPANSION_TOLERANCE
PLACEMENT_UNIQUE
PLACEMENT_EXPANSION_TOLERANCE
PICTTOPPM
XV
MAIL
BINHEX
DEBUG
RHMAPPER
rhmapper
Directoryrhmapper*
rhmapper
shell.
rhmapper.conf
DATA/
util/
rhmapper
and other programs.
bin/
rhmapserver
executable and a number of
external Perl scripts called by rhmapper
to
perform useful tasks.
One of these scripts, place_markers, can be used to map your data to the Whitehead frameworks without learning any of the details of RHMAPPER. See Using the place_markers Script for more details.
server/
rhmapserver
.
testdata
local_hacks/
prego% rhmapperAfter a short delay during while
rhmapper
starts up
and initializes the server, you will be greeted with a command line
prompt:
1> print "Hello world!" Hello world! 2>You can now interact with the shell, typing commands and getting responses.
The interpreter maintains a history list of all previous commands. You can move through the history list with the up and down cursor keys, edit previous commands and re-execute them by hitting return. In addition, you can re-execute commands by typing an exclamation point (!) followed by the number of the command to repeat. You can view the complete history of the session with the command history (this works even if Term::Readline is not installed).
You can type a portion of a command followed by the tab key and the interpreter will attempt to complete it for you. If there are several possible completions, the interpreter will display the alternatives. As of version 1.22, a limited form of context-dependent command completion is also performed. For example, if you type a command that ordinarily operates on groups and press the tab key, the interpreter will display all the current groups.
For example, the "evaluate" command is used to evaluate a potential marker order and give the likelihood score and breakeage frequencies:
4> evaluate A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07 NAME BREAK FREQ cR A152WG9 0.035 3.6 MR7497 0.086 9.0 GCT-D18S852 0.118 12.6 MR10399 0.084 8.8 GCT5D07 0.000 0.0 LIKELIHOOD = -72.699698 MAP LENGTH = 33.88However, evaluate can also be used with perl syntax, in which case it returns the likelihood score and the breakeage frequencies as function results:
5> ($like,@theta)=evaluate('MR7774','MR6986','A058YG5','AFM254YD5')Note that single quotes around string constants are recommended for marker names. Otherwise names like WI-453 will confuse Perl into trying a subtraction.
Many frequently-used commands have shortcut abbreviations. For example, the "load_data" command can be abbreviated "ld".
Most of the built-in commands produce printed output. This can be undesirable when incorporating them into subroutines. Output can be suppressed by setting the global variable $QUIET to a nonzero value:
5> $QUIET=1; 6> ($like,@theta)=evaluate('MR7774','MR6986','A058YG5','AFM254YD5') 7> $QUIET=0;In order to allow you to enter Perl expressions that span more than a line (such as subroutines), the shell defers execution until the expression is "complete", meaning that the number of open and close brackets, braces and parenthesis are balance. Therefore this will work:
5> sub count_ones { cont: my $vector = shift; cont: my $count = $vector=~tr/1/1/; cont: print "$count ones\n"; cont: } 6> count_ones 10101010110101010101 9 onesYou can use a backslash to force continuation in circumstances in which the interpreter would screw with simple bracket counting:
7> sub count_ones \ 8> { my $vector = shift; ....Type // or q to back out of a continuation you wish to abort.
8> evaluate MR7774 MR6986 A058YG5 AFM254YD5 > ./temporary.file 9> evaluate MR7774 MR6986 A058YG5 AFM254YD5 | mail lsteinYou can execute any UNIX shell command by issuing the "system" command:
1> system df . Filesystem 512-blks used avail capacity gumbo:/usr/home/ 1927324 1849584 58468 97%Type "system" by itself to start up a UNIX shell.
The ">" and "|" redirection commands do not work while you're
inside a Perl subroutine -- they only apply to commands typed at the
top level of the RHMAPPER shell. However, a utility
subroutine redirect()
lets you achieve the same effect:
sub remap { load_groups('framework'); foreach $chr ('Chr1','Chr2','Chr3') { $CURRENT_GROUP=$chr; @markers = extract("CHROM=$chr"); redirect("./maps/$chr.out"); print "Starting placement map for $chr.\n"; create_placement_map($chr,@h); show_map; } }You can either provide a bare file name for redirect() or provide a pipe of commands.
10> get alpha ALPHA = 0.0004 11> set alpha 0.002 ALPHA = 0.002All the variables defined in the configuration file are available as global variables and can be get and set in this way. You can also set these variables in Perl style:
12> $ALPHA = 0.0004 13> print $ALPHA 0.0004There are two important differences between using get/set and setting them programatically. One is that get/set will automatically capitalize the global. Perl is case sensitive, so $alpha and $ALPHA are different variables. The second is that a side effect of get/set is to propagate the changes into the environment. This allows you to spawn processes that use the rhmapper libraries and have them use the same settings. In practice, it turns out that you don't need to do this terribly often.
help command_nameor a list of all 69 commands known to RHMAPPER by typing help without an argument. This listing will be too long for your screen. I suggest you pipe it through more like this:
help | moreor search it for keywords of interest with the grep command:
help | grep framework
4> history Command history: 1: set project genome 2: set alpha 0.01 3: set beta 0.002 4: history
DATA --+ |--low_res-+ | |-Keywords | |-Marker_info | |-Marker_names | |-Pairs | |-groups-+ | | |-framework | | | +-maps-+ | |-Chr18.map | |-Chr19.map | |--test----+ | |-Keywords | |-Marker_info | |-Marker_names | |-Pairs | |-groups-+ | |-framework | |--genome--+ |-Keywords |-Marker_info |-Marker_names |-Pairs |-groups-+ | |-working | |-framework | |-frame4.23.5 | +-maps-+ |-Chr1.map |-Chr3.map |-Chr4.map |-Chr5.mapThe data itself is kept in files called "Keywords", "Marker_info", "Marker_names", and "Pairs" (the names may be slightly different depending on which Perl DBM implementation you use). Ordered groups of markers, usually framework maps, are kept in a subdirectory called "groups". Placement maps are kept in a subdirectory called "maps". RHMAPPER routines manage these directories for you, so ordinarily you won't need to deal with them (although we frequently find it easiest to edit the framework files directly).
To select the current project, set the global variable $PROJECT. The best way to do this is with the command set:
1> set project chr18 PROJECT = chr18To view the current project, use the get command:
1> get project chr18 PROJECT = chr18
NAME CHROM RHVECTOR CHLC.GATA11A06 Chr18 01110111100000110000001100110010001010101000 MR10918 Chr18 00100101100100100110001001100010011011110110 AFMC014WF9 Chr18 00100101100101100000001101101110111001110010 MR10399 Chr18 01111111100100010110001100110110010010111210 GATA-D18S850 Chr18 00001100100000100121001000100012011011100000 MR3430 Chr18 01010111100100110020001100120010011020111001There is line one per marker, one column per field, and columns are separated by tabs (spreadsheet style). The top line should contain a series of field names.
There should be one column labeled NAME, and another labeled RHVECTOR. In addition to these two, you can place any other name fields you wish. By convention, we use "CHROM" for chromosomal assignment, "GDIST" for genetic position, and "RANK" for the subjective quality of the marker. If no column labels are present, RHMAPPER will assume that the first column is the marker name and the second one is the RHVECTOR.
RHMAPPER doesn't care about the format of names (although it is convenient if they're alphabetical and don't contain whitespace). The data vector should be composed of the following numeric codes only (no whitespace allowed):
To create a new project and initialize it using data from a flat file, use the load_data command (ld for short).
16> set project chr18 PROJECT = chr18 17> ld testdata/Chr18.dat ./bin/table2boulder.pl testdata/Chr18.dat | ./bin/load_rh_info -CLn ./DATA/chr18 doesn't exist. Creating directory. Loaded: 66 Bring the pairwise table up to date? [y]: y ./bin/new_markers | ./bin/calculate_pairs -x | ./bin/load_pairs -L 1512 pairs loaded. Setting errors to 0.002,0.0002,0.40. Deleting list of new markers. Resetting cache...load_data will load the contents of the flat file into the RHMAPPER database, replacing whatever was there before. After loading, you will be asked if you want to bring the pairwise table up to date. If you answer yes to this, the system will calculate the linkage scores between all pairs in the data set. This is useful for finding linkage groups and during framework construction, because it saves a lot of time in later calculations. It can take a significant length of time to run on large data sets (1000 markers or more) and might best be left to run overnight. After frameworks are constructed, pairwise calculations are no longer particularly useful.
To add data to an existing database, use the add_data (ad) command. Markers with the same names as ones already in the database will be updated. Again you will be asked if you want to update the pairs database.
You can avoid having to answer the question about updating the pairs database by placing a 1 (yes) or a 0 (no) on the command line. This is useful if you want to load data as part of a batch script:
19> add_data testdata/Chr18.dat 1
27> get_data MR3396 MR3430 318XD5 NAME RHVECTOR CHROM MR3396 00011010100000000110001000000010000... Chr18 MR3430 01010111100100110020001100120010011... Chr18 318XD5 00111011100000000100001001000010001... Chr18By default, this will print a table of all the fields the database knows about, in load format. To restrict the fields displayed to a subset, pass get_data a list of the fields you're interested in prefixed by a hyphen:
28> gd -CHROM MR3396 MR3430 318XD5 NAME CHROM MR3396 Chr18 MR3430 Chr18 318XD5 Chr18Typing
get_data
by itself (or with a field modifier)
dumps out the database for all markers.
Simple database searches are possible using the extract (ed) command. You can search for regular expressions in marker names or in selected fields. For example, to search for all markers beginning with the prefix AFM:
15> ed NAME=^AFM NAME RHVECTOR CHROM AFMB311WH5 001001011001011001100011011... Chr18 AFMC014WF9 001001011001011000000011011... Chr18To search for all markers in the data set beginning with the prefix ^MR and whose RHVECTORS contain the pattern 00000000001:
17> ed NAME=^MR RHVECTOR=00000000001 NAME RHVECTOR CHROM MR7458 0011001210000000000000100100001000... Chr18 MR6366 0011101110000000011000100100001000... Chr18 MR6213 0011001110000000000000100100001001... Chr18This command is mostly useful for extracting portions of the data set by chromosome or rank. The Perl style function call will return a list of the matching markers:
18> @h=&extract('NAME=^MR','RHVECTOR=00000000001')You can access the database programatically using the Perl subroutine marker_info():
18> print marker_info('MR11061','CHROM') Chr18
31> linked MR11061 HSC0NA032 LOD=14.971257, THETA=0.190When this command is used with Perl syntax, it returns the lod score and the breakage frequency as function results:
32> ($lod,$freq) = linked('MR11061','HSC0NA032')
35> find_linked MR11061 10 MR10918 LOD=21.2 THETA=0.09 312WE9 LOD=15.5 THETA=0.19 HSC0NA032 LOD=15.0 THETA=0.19 GATA-P28072 LOD=15.1 THETA=0.20 164YA9 LOD=12.9 THETA=0.24 CHLC.GATA8E05.437 LOD=12.6 THETA=0.25 CHLC.GATA4H06.130 LOD=12.7 THETA=0.26 HSC02E112 LOD=12.2 THETA=0.27 GATA-P19271 LOD=11.8 THETA=0.28 MR5846 LOD=11.7 THETA=0.28When called as a Perl function, this returns a list of linked markers in descending order of lod significance.
42> link_groups 10 TWO_POINT_CUTOFF = 10 Found 2 linkage groups at LOD=10. Group 1: size 47 MR10918 AFMC014WF9 GATA-D18S850 GATA-P6094 318XD5 MR3275 HSC0NA032 MR7850 MR6366 MR3396 MR10303 MR4847 MR3759 MR7458 MR7679 GATA-P19271 CHLC.GATA10A09.966 UTR-03540 GATA-P6006 A081TC5 242YF2 CHLC.GATA4H06.130 312WE9 MR6213 164YA9 A131XH9 GATA-D18S849 MR3726 CHLC.GATA8E05.437 MR9461 MR9606 CHLC.GATA2E06.13 HSC03A012 MR5846 MR11061 HSC02E112 GCT3G01 EST106071 GCT-P10825 GATA-P18613 GATA-P28072 GATA-P19280 UTR-03210 MR11742 MR10536 CHLC.GATA3E08.39 AFMB311WH5 Group 2: size 19 CHLC.GATA11A06.668 MR10467 NIB1802 MR10399 MR3430 GCT-D18S852 B082ZF1 MR5261 A152WG9 A058YG5 GATA-P6694 MR6657 MR10800 MR7774 MR7497 MR6986 MR10742 MR12029 GCT5D07 UNLINKED: 0This command has the side effect of defining several marker groups named LINK_1, LINK_2, and so on, that can be manipulated with the various group commands described later. When called as a Perl function, link_groups returns an array of linkage groups, each of which is an array of marker names.
54> evaluate A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07 NAME BREAK FREQ cR A152WG9 0.020 2.0 MR7497 0.070 7.3 GCT-D18S852 0.118 12.6 MR10399 0.084 8.8 GCT5D07 0.000 0.0 LIKELIHOOD = -71.634071 MAP LENGTH = 30.61Given a marker order, evaluate performs EM maximization and prints out a table showing the estimated breakage frequencies and the calculated likelihood. This calculation is subject to the convergence threshold as well as alpha and beta error rates:
56> set CONVERGENCE_THRESHOLD 0.0001 CONVERGENCE_THRESHOLD = 0.0001 57> evaluate A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07 NAME BREAK FREQ cR A152WG9 0.017 1.7 MR7497 0.068 7.0 GCT-D18S852 0.115 12.2 MR10399 0.082 8.6 GCT5D07 0.000 0.0 LIKELIHOOD = -71.624104 MAP LENGTH = 29.53When called in a Perl expression, evaluate returns the likelihood of the order, and an array of breakage frequencies:
74> $QUIET=1; 75> ($likelihood,@theta)=evaluate(A152WG9,MR7497,'GCT-D18S852', cont: MR10399,GCT5D07) 76> print $likelihood -71.501194 77> print "@theta" 0.019 0.067 0.112 0.081
In the current implementation of groups a given marker can only belong to one group at a time. This restriction may be relaxed in future versions.
30> defg close_markers A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07This places the named markers on a list named "good_markers". They will be removed from any groups they were already on. Two ways to do the same thing using Perl function calls are shown here:
31> &define_group('good_markers','A152WG9','MR7497', \ cont: 'GCT-D18S852','MR10399','GCT5D07') 32> @h = qw/A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07/ 33> &define_group('good_markers',@h)Once a group is defined, it can be used in any command that accepts a list of markers:
38> evaluate close_markers CURRENT_GROUP = close_markers NAME BREAK FREQ cR A152WG9 0.020 2.0 MR7497 0.070 7.3 GCT-D18S852 0.118 12.6 MR10399 0.084 8.8 GCT5D07 0.000 0.0 LIKELIHOOD = -71.634071 MAP LENGTH = 30.61The global variable CURRENT_GROUP contains the group you are currently working with. If you don't provide a group name for commands that manipulate groups, they will assume the group named in this global. This example will have the same result as the previous one:
39> set current_group close_markers CURRENT_GROUP = close_markers 40> evaluateFor convenience, the set_group (setg) command is a shortcut for
set current_group
:
41> set_group close_markers CURRENT_GROUP = close_markers
intersect_groups group1 group2 destination
union_groups group1 group2 destination
subtract_groups group1 group2 destination
delete_group group1 group2 group3...
46> list_groups close_markers (5): A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07 bad_markers (3): 318XD5 MR3396 MR3430
49> save_groups workingThis will save the list of groups into a file named
working
in the subdirectory
./DATA/chr18/groups/
(assuming that the current project
is named "chr18"). You can get a listing of files containing saved
groups with the list_saved_groups command (lsg):
52> lsg total 1 -rw-rw-r-- 1 lstein genome 90 Nov 14 10:44 working
Groups can be restored from a file using with the load_groups (loadg) command:
51> loadg working 2 groups loaded.The format of the saved group files is dirt simple. There's one line per group. Each line starts with the group name, followed by a list of the markers in the group. You are free to edit the saved group file and then reload the groups. For example, here's what the working group file will look like after executing the commands above:
bad_markers 318XD5 MR3396 MR3430 close_markers A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07If you give load_groups and save_groups a file path instead of a bare file name, the routines will load from and save to the desginated file rather than to the groups directory:
52> loadg /home/gumbo/lstein/imported_frameworks
58> @h=group(close_markers) 59> p "@h" A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07
65> permute close_markers A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07 -71.634071 A152WG9 MR7497 GCT-D18S852 GCT5D07 MR10399 -64.999899 . . . MR10399 GCT-D18S852 MR7497 A152WG9 GCT5D07 -68.528327 BEST ORDER: GCT-D18S852 GCT5D07 MR7497 A152WG9 MR10399 NEXT BEST : A152WG9 MR7497 GCT5D07 GCT-D18S852 MR10399 LOD vs nextbest = 1.934764You can call it with a named group as shown here, or with a list of marker names. When called as a Perl function, permute will return the lod for the best order vs the next best order, followed by the best order:
66> ($lod,@order) = permute('close_markers')Because there are N!/2 permutations of a list of N markers, permute will take a very long time for lists greater than 7 in length.
51> ripple 3 close_markers CURRENT_GROUP = close_markers A152WG9 GCT-D18S852 MR7497 MR10399 GCT5D07 -77.081065 MR7497 A152WG9 GCT-D18S852 MR10399 GCT5D07 -72.720026 . . . GRAND BEST ORDER: A152WG9 MR7497 MR10399 GCT-D18S852 GCT5D07 NEXT BEST ORDER: A152WG9 MR7497 GCT-D18S852 MR10399 GCT5D07 LOD vs nextbest = 1.052755We can deduce from this display that we should not be very confident of the order of markers in
close_markers
, as local
permutations alone found another order that is only 10-fold less
likely. In contrast, when this command is run on the best order
found by exhaustive permutation search, the next best order found has
a lod of 2.24, more than 100-fold less likely.
One strategy for fine-tuning orders might be to repeat the ripple sort until the algorithm fails to find local improvements in the order. However, we have not used this technique to create our maps and don't have experience with it.
The function results for ripple_sort
are the lod
difference between the starting order and the final order, and the
final order itself:
61> ($change,@new_order) = ripple_sort(3,'close_markers')
65> span MR6657 A058YG5 Found 53 markers linked to MR6657 or A058YG5 A058YG5 MR6657 MR3275 : SIGNIFICANT BUT FLIPS A058YG5 MR6657 MR7458 : SIGNIFICANT BUT FLIPS A058YG5 MR6657 CHLC.GATA4H06.130 : SIGNIFICANT BUT FLIPS A058YG5 MR6657 MR6213 : SIGNIFICANT BUT FLIPS A058YG5 MR6657 HSC03A012 : SIGNIFICANT BUT FLIPS A058YG5 MR6657 GATA-P28072 : SIGNIFICANT BUT FLIPS The following orders span the markers given: MR6657 GATA-P6694 A058YG5 (4.76667400000001) MR6657 NIB1802 A058YG5 (3.99561999999999) MR6657 MR7774 A058YG5 (3.876677)In this example, three markers were found that fit between MR6657 and A058YG5. Six more markers were found that have unique orders in the triple, but in each of these cases the best position was at the end of the order.
Use extend to extend the order of a partial order. This search will find markers that are linked to the right end of the order. It then determines all permutations of the partial order and reports success if it the chosen marker maps to the right end of the partial order:
5> extend A058YG5 MR6657 MR7458 Found 58 markers linked to MR7458 A058YG5 MR10742 MR6657 MR7458 : SIGNIFICANT BUT FLIPS A058YG5 GATA-P6694 MR6657 MR7458 : SIGNIFICANT BUT FLIPS A058YG5 MR7774 MR6657 MR7458 : SIGNIFICANT BUT FLIPS A058YG5 NIB1802 MR6657 MR7458 : SIGNIFICANT BUT FLIPS The following orders extend the partial order: A058YG5 MR6657 MR7458 CHLC.GATA8E05.437 (4.907392) A058YG5 MR6657 MR7458 CHLC.GATA10A09.966 (4.70801200000001) A058YG5 MR6657 MR7458 HSC0NA032 (4.52770799999999) A058YG5 MR6657 MR7458 CHLC.GATA3E08.39 (4.49122799999999) A058YG5 MR6657 MR7458 164YA9 (4.47804500000001) . . .The Perl function result of both these commands is a list of candidate markers.
62> print_group close_markers CURRENT_GROUP = close_markers A152WG9 2.02 F 0110111110010011011000110... MR7497 7.26 F 0110111110010011011000110... GCT-D18S852 12.56 F 0110111110000001011000110... MR10399 8.77 F 0111111110010001011000110... GCT5D07 0.00 F 0110111110010001011000110...
63> mail_group Chr18.framework
There are several approaches to building frameworks. One is to use external information, such as known genetic order, to nucleate the framework. Once the framework is started, it is easy to grow it by incrementaly adding in markers. See growing an existing framework.
If external ordering information isn't available to get the framework started, you can assemble the framework from scratch by finding a set of well-ordered triples which can be assembled into a well-ordered framework.
A-B-C B-A-C A-C-BThe strategy is to find all the triples in the data set for which one of these three orders is more likely than the others by a high lod score. The triples can then be assembled into a framework as illustrated here:
Triple 1: A-B-C Triple 2: B-C-D Triple 3: C-D-E Triple 4: D-E-F Deduced Order: A-B-C-D-E-FBecause framework building is sensitive to a number of parameters, our strategy is to find all the well-ordered triples in a data set, save them to a file, and then apply various combinations of frame building algorithms and parameters to the file.
The command find_triples is used to search the data set for well-ordered triples. The command's format is:
find_triplesYou provide the name of a file (absolute or relative) to save the triples to and a list of markers to examine. This can be a named group, such as a group created by the find_link_groups command or a list of markers typed on the command line. The equivalent Perl function accepts a Perl list for this argument. For example, here's how to find all the well-ordered triples in the entire data set and dump them out to a file named "triples":
25> set project chr14 26> loadg working 27> find_triples ./triples LINK_1 processing pairs ... After 5000 triples, The best triple in database is: CHLC.GATA11A06.668 GCT-D18S852 MR5261 -79.322560 CHLC.GATA11A06.668 MR5261 GCT-D18S852 -78.088223 MR5261 CHLC.GATA11A06.668 GCT-D18S852 -71.741460 with a lod score of 6.346763 . . . Finally: The best triple in database is: MR3396 MR7679 MR3726 -79.840608 MR3396 MR3726 MR7679 -78.672711 MR3726 MR3396 MR7679 -70.772235 with a lod score of 7.90047600000001find_triples is sensitive to the value of the global parameter FRAMETHRESH (we usually use 3.0). Triples for which the next best order is within FRAMETHRESH of the best order are discarded.
This command takes a while to run: a set of 100 marekrs can take several hours on a DEC alpha, and the running time increases dramatically with more markers. You can reduce running time and increase the ultimate map quality by preselecting your best markers to use for framework building. A good idea would be to select markers containing no missing data.
The format of the file written by find_triples should be self-explanatory.
The two commands have the same arguments and produce the same output, but use slightly different algorithms. The assemble_framework1 seeks to find the longest path involving overlapping triples such that the last two markers of the left triple overlaps with the first two markers of the right triple, for all adjacent triples in the path. During construction, triples which are ordered at a lod of less than TRIPLE_LOD, or which contain gaps in excess of TRIPLE_DIST centirays are discarded. This criterion is very stringent and the paths found using this algorithm tend to be short but reliable.
assemble_framework2 takes a more aggressive approach
by using all the ordering information in the triples to assemble a
path. For example the following pair of triples will be assembled by
assemble_framework2
:
A-B-C A - C-D Will be assembled into: A-B-C-D
assemble_framework2
tends to find longer paths. However the paths may
not be as reliable (this algorithm has not be extensively tested yet).
This command also respects the values of TRIPLE_LOD and TRIPLE_DIST.
Both these commands are invoked in the same way:
5> assemble_framework1 triples Loading triples... There are 298 triples ordered at lod 4.0 and with no gaps larger than 30 cR. Now searching for frameworks... Found a path containing 6 markers, consisting of the triples: EST106071 MR3396 GATA-P6094 6.839365 MR3396 EST106071 312WE9 4.889032 164YA9 312WE9 EST106071 5.579925 312WE9 164YA9 MR10303 4.21074499999999 Path: GATA-P6094 MR3396 EST106071 312WE9 164YA9 MR10303 Found a path containing 6 markers, consisting of the triples: MR10918 CHLC.GATA4H06.130 MR3275 4.607503 . . . Overall longest path contains 12 markers. Longest Path: MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 MR7679 \ MR3275 A131XH9 MR5846 CHLC.GATA8E05.437 A081TC5 GATA-P19280The command prints out all paths that it finds: this is often copious. At the end the single path found will be printed. When called with Perl syntax, the subroutine will return the longest path found. This can then be assigned to a named group and used as the basis for growing a longer framework.
I suggest that you run the ripple test on any framework that these algorithms find. You should also evalute the map to make sure that the framework consists of evenly spaced markers:
7> ripple 3 MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 MR7679 \ cont: MR3275 A131XH9 MR5846 CHLC.GATA8E05.437 A081TC5 GATA-P19280 LOD vs nextbest = 4.30339899999998 8> evaluate MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 \ cont: MR7679 MR3275 A131XH9 MR5846 CHLC.GATA8E05.437 A081TC5 GATA-P19280 NAME BREAK FREQ cR MR11742 0.108 11.4 GATA-D18S850 0.206 23.1 GCT3G01 0.180 19.9 UTR-03540 0.210 23.6 MR3396 0.239 27.3 MR7679 0.164 17.9 MR3275 0.164 17.9 A131XH9 0.134 14.4 MR5846 0.156 17.0 CHLC.GATA8E05.437 0.181 20.0 A081TC5 0.149 16.1 GATA-P19280 0.000 0.0 LIKELIHOOD = -219.031954 MAP LENGTH = 208.5 9> define_group chr18 MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 \ cont: MR7679 MR3275 A131XH9 MR5846 CHLC.GATA8E05.437 A081TC5 GATA-P19280 10> save_group working
For each marker provided
grow_framework
The first parameter is the name of the framework group to try to grow. This should be followed either by a named group or by a list of markers.
If any of the markers provided are already on the framework, you will see a warning message and the marker will be skipped. This command is most useful as a Perl function call where it will return the new framework as the function result.
For example, here's how to add all markers in LINK_1 group to an existing framework group named "chr18" at lod 3.0:
7> set framethresh 3.0 FRAMETHRESH = 3.0 8> loadg working 2 groups loaded. 9> grow_framework chr18 LINK_1 ** 318XD5 ** Stepwise... Rippling pairs... Rippling triples... SUCCEEDED: MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 \ 318XD5 MR7679 MR3275 A131XH9 MR5846 CHLC.GATA8E05.437 A081TC5 \ GATA-P19280 LOD: 3.44092600000002 ** GATA-P6094 ** Stepwise... ...failed with lod 0.278309999999976 ** HSC0NA032 ** Stepwise... Rippling pairs... Rippling triples... SUCCEEDED: MR11742 GATA-D18S850 GCT3G01 UTR-03540 MR3396 318XD5 \ MR7679 MR3275 A131XH9 MR5846 HSC0NA032 CHLC.GATA8E05.437 A081TC5 \ GATA-P19280 LOD: 3.443511 . . . TRIED: 54 ADDED: 7 FINAL ORDER: MR11742 MR9606 GATA-D18S850 MR4847 GCT3G01 UTR-03540 \ MR3396 MR6366 318XD5 MR7679 GATA-P18613 MR6213 MR3275 A131XH9 MR5846 \ HSC0NA032 CHLC.GATA8E05.437 A081TC5 GATA-P19280 Updating chr18... 11> save_groups working
If you've been following the examples so far, the framework map obtained in the previous step contains the Q arm of Chr18. In order to finish the map, it would be necessary to work separately on the other linkage group and then knit the two together using external information.
Creating a placement map is straightforward. The easiest way to do it is with the create_placement_map command. This command accepts the name of the new placement map and a list of markers to place on it. It gets the name of the framework to use to start the map from the CURRENT_GROUP global variable. Here's how to create a placement map named "Chromosome18" from a framework map named "chr18":
13> load_groups working 14> set_group chr18 15> create_placement_map Chromosome18 LINK_1This command will create a placement map named "Chromosome18" and store it in the global variable CURRENT_MAP. As the command runs, it attempts to place all markers in group 'LINK_1' onto the map; as it runs it displays the alternative placements for each marker. Markers that map well are added to the placement.
In addition to specifying the markers to map in a named group, you can present a list:
create_placement_map Chromosome18 HSC0NA032 A081TC5 GATA-P19280 -or- create_placement_map('Chromosome18',group(LINK_1))
create_placement_map is sensitive to the value of
PLACEMENT_TOO_FAR
. If a marker maps off the end of the
framework, it will be discarded if the distance exceeds the value
of this global (15 cR by default). This catches and discards markers
containing many errors.
Once created, the current placement map can be printed in tabular form to the screen with print_map, viewed graphically in a pop up window with display_map, or converted to Macintosh PICT format and mailed to the current user with mail_map.
Here is an example of a graphical map created by mail_map and display_map:
17> save_map chr18.map 18> load_map /home/lstein/old/chr18.mapIf a bare filename is used (no slashes), the map is stored in a subdirectory of the project called "maps". Otherwise, the provided pathname is used. In the first example above, the map is saved to the file
$DATA/chr18/maps/chr18.map
(where $DATA is the database directory).
In the second example, another map is restored from
the file /home/lstein/old/chr18.map
list_saved_maps (listm) will list all maps saved in the "maps" directory.
19> load_map chr18.map 20> add_to_map MR10399 318XD5 MR3396 21> save_map chr18.map
4> find_errors 3 GCT-D18S852 GCT5D07 MR7497 A152WG9 MR10399 GCT-D18S852 011011111000000101100110110010000111100101011000011... GCT5D07 011011111001000101100110110010010111100101011000011... MR7497 011011111001001101100110110010010111000101001000011... ^ A152WG9 011011111001001101100110110010010111010101011000011... MR10399 011111111001000101100110110010010111210101111000011...In this example, the caret marks a single hybrid in marker MR7497 in which there is a 1-0-1 transition. Because a threshold of 3 was specified in the command RHMAPPER is reporting that this result is 1000:1 more likely to represent a laboratory error than a true result. (The results for MR7497 were deliberately altered in order to create this example -- it won't work in the test data set provided with RHMAPPER). In order for these results to be meaningful, the order must be essentially correct.
Called in Perl function call style, find_errors() returns an associative array in which the keys are the names of the flagged markers. The values are associative arrays in which the keys are the hybrid numbers and the values are the lod of error scores:
7> $QUIET=1; %errors = find_errors(3,'Chr18.framework'); 8> foreach $marker (keys %errors) { cont: foreach $hybrid (keys %{$errors{$marker}}) { cont: print "ERROR: $marker on $hybrid, LOD $errors{$marker}->{$hybrid}\n"; cont: } cont: }
To use it, you must have RHMAPPER installed and running. You'll first need to create and load a project named "whitehead", using the data provided in the testdata/ subdirectory:
In this example only the subset of the Whitehead markers that correspond to the framework maps was loaded. The rest are not needed for place_markers to run. You do not need to bring the pairwise comparisons table up to date either, although you can do so (this will take several hours to overnight).zorro: rhmapper 1> set project whitehead PROJECT = whitehead 2> ld testdata/wiframeworks.dat ./bin/table2boulder.pl testdata/wiframeworks.dat | ./bin/load_rh_info -CLn Assuming first two columns columns are marker name and vector Loaded: 1630 Bring the pairwise table up to date? [y]: n 3> quit
To run place_markers, create a file that looks like this:
The first column should contain the names of the marker's you're mapping. The second column should contain the RH vectors in Genebridge 4 order. You may introduce spaces in the vector to increase readability.HK45-L 0111010112001110101101101001010101111100... MBQH2 0110002000000001111101111001001120001010... Glue2 1111001110100101001101010011101120011010...
Now run the file through place_markers in this way:
place_markers has many command line switches to control its behavior. Among the more useful are:zorro% bin/place_markers <datafile NAME Chromosomal Assignment and Placement(s) ---- --------------------------------------- HK45-L Chromosome Chr9 Places 70.52 cR from WI-3028 (lod >3.0) MBQH2 Chromosome Chr4 Places 0.00 cR from WI-4262 (lod >3.0) Glue2 ** No linkages at lod 15 ** PICTURES, if requested, are attached to the bottom of this message. Submitted markers in the context of Whitehead framework map: Placement Data Relative to Framework for Chromosome Chr4 Name Dist Type Vector ---- ---- ---- ------ WI-6657 8.2 F 010100010100000001001011011011010100... WI-5430 4.7 F 000100010100100001001011011010011100... D4S2925 5.4 F 000000010100100001000011010010011100... CHLC.GATA42 11.6 F 010000010100100001000011010010011100...
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
This product includes software developed by the Whitehead Institute for Biomedical Research.
We also request that use of this software be cited in publications as:
L. Stein, L. Kruglyak, D. Slonim and E. Lander. (1995) "RHMAPPER", unpublished software, Whitehead Institute/MIT Center for Genome Research. Available at http://www.genome.wi.mit.edu/ftp/pub/software/rhmapper/, and via anonymous ftp to ftp.genome.wi.mit.edu, directory /pub/software/rhmapper.
THIS SOFTWARE IS PROVIDED BY THE WHITEHEAD INSTITUTE ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE WHITEHEAD INSTITUTE BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Index | Cell Line | Index | Cell Line | Index | Cell Line | Index | Cell Line | ||
---|---|---|---|---|---|---|---|---|---|
1 | 4A4 | 21 | 4E2 | 41 | 4J9 | 61 | 4P11 | 81 | 4V2 |
2 | 4A5 | 22 | 4E4 | 42 | 4K5 | 62 | 4Q2 | 82 | 4V3 |
3 | 4AA5 | 23 | 4E6 | 43 | 4K7 | 63 | 4Q4 | 83 | 4V7 |
4 | 4AA7 | 24 | 4E11 | 44 | 4K8 | 64 | 4R1 | 84 | 4V8 |
5 | 4B2 | 25 | 4F6 | 45 | 4K9 | 65 | 4R2 | 85 | 4W1 |
6 | 4B3 | 26 | 4F7 | 46 | 4K12 | 66 | 4R3 | 86 | 4Y4 |
7 | 4B9 | 27 | 4F13 | 47 | 4L3 | 67 | 4R5 | 87 | 4Y8 |
8 | 4BB1 | 28 | 4G1 | 48 | 4L4 | 68 | 4R6 | 88 | 4Y9 |
9 | 4BB6 | 29 | 4G5 | 49 | 4L6 | 69 | 4R10 | 89 | 4Z5 |
10 | 4BB8 | 30 | 4G6 | 50 | 4M4 | 70 | 4R12 | 90 | 4Z6 |
11 | 4BB10 | 31 | 4G7 | 51 | 4M5 | 71 | 4S3 | 91 | 4Z9 |
12 | 4BB12 | 32 | 4G11 | 52 | 4N3 | 72 | 4S6 | 92 | 4Z11 |
13 | 4C3 | 33 | 4H1 | 53 | 4N5 | 73 | 4S10 | 93 | 4Z12 |
14 | 4C11 | 34 | 4H8 | 54 | 4N6 | 74 | 4S12 | ||
15 | 4CC8 | 35 | 4H9 | 55 | 4N7 | 75 | 4T3 | ||
16 | 4D1 | 36 | 4H12 | 56 | 4N12 | 76 | 4T4 | ||
17 | 4D7 | 37 | 4I1 | 57 | 4O5 | 77 | 4T10 | ||
18 | 4DD2 | 38 | 4I4 | 58 | 4O10 | 78 | 4T11 | ||
19 | 4DD5 | 39 | 4J2 | 59 | 4P2 | 79 | 4U1 | ||
20 | 4DD8 | 40 | 4J5 | 60 | 4P9 | 80 | 4U3 |