User and reference documentation

When you use CARTHAGENE, there are three implicit objects that are manipulated by the software that are worth mentionning: these are the current data-set, the current list of selected markers and the current set of the best known maps (also referred to as the heap).

Despite the fact that CARTHAGENE can load several data-sets to work with, there is only one data-set active at a given moment in CARTHAGENE. This data-set may be a single population data set or a multiple population (or so-called ``merged'') data-set. This active data set is always the last data-set that has been created (loaded or merged) . All further computations that will be done will be done with respect to this data set, later referred to as the active dataset. Note that the name of the markers in the data-sets are very important since they are used by CARTHAGENE to determine if two markers in two different data sets are identical or not. Besides its name, each markers will receive a so-called numerical id that is often used by CARTHAGENE. One can easily go between marker names and marker ids using the mrkname/mrknames and mrkid/mrkids commands.

Once the populations you intend to work with have been loaded and merged as you want, mapping is essentially a problem of choosing and ordering a set of markers. At any time when you use CARTHAGENE there is an implicit list of selected markers that specify the set of markers you intend to work with. This list is also used as a default markers ordering for some commands (eg. the sem command, see section 2.5.2). When a data set is created (loaded or merged), all the markers of this data set are selected. You can visualize and modify this selection (see the mrkselset command, section 2.3.14 and the mrkadd, mrksub and mrkdel commands).

The last important object in CARTHAGENE is the so-called heap. The heap is simply a bounded-size container for all the best maps for the current marker selection that have been encountered by CARTHAGENE during the session. Basically, this means that each time a map likelihood is evaluated by CARTHAGENE, either inside a complex ordering strategy or following a direct user request, it is considered as a candidate for being kept in the heap. If there is room in the heap or if new map has a better likelihood than the worst map in the heap, then the new map will be stored in the heap for further analysis. Eventually, when enough alternative orders have been considered, the heap can be examined to see how strongly the best order is supported by the data by giving alternative sub-optimal orders that were found during search. The size of the heap is user configurable and defaults to 15 maps. Because the heap is implemented using a nice and efficient data-structure, it can be fairly large (eg. more than 1000 maps) without slowing down the software. So don't hesitate to change this default of 15 if needed (see the heapsize command, section 2.4.2).

The heap has another implicit use: several CARTHAGENE commands that need a marker ordering (or map) to start with will automatically take the best map of the heap as a starting point. This means that if the heap is empty because you haven't yet computed the likelihood of any map, these procedures will complain of having no starting point. It is up to you to fill the heap using any map building command (see section 2.5).

Thomas Schiex 2009-10-27