Maps and the Heap

A map in CARTHAGENE defines the parameters of a probabilistic model of the genetic/RH data available on the currently active set of markers. In the simplest cases (eg. one genetic population), a map is completely defined by two components: an order of the markers and the recombination probabilities between adjacent markers. As in most existing mapping software (eg. MapMaker), all the probabilistic models used in CARTHAGENE assume a complete absence of interference on genetic data. In the case of RH data, recombination probabilities are replaced by breakage probabilities along with a retention probability parameter. The probabilistic model used is the so-called ``equal retention model'' [LBLC95] which seems to be an ideal compromise between simplicity, efficiency and realism. Given a map, since all the parameters of the probabilistic model are fixed by the map, it is possible to compute the probability of the data given the model often coined as the likelihood of the data. Note that the continuous part of the probabilistic model (probabilities) are always estimated using a maximum-likelihood criterion in CARTHAGENE. This criterion is a rigorous criterion which has several attractive theoretical properties. It is optimised in CARTHAGENE using variants of the EM algorithm [DLR77].

The probabilistic parameters used in the model can be converted to so-called distances using traditional mapping functions: Haldane and Kosambi that convert recombination probabilities to Morgans for genetic data and the usual $-\log(1.0-\theta)$ for mapping breakage probabilities to Rays. Note that the probabilistic model used in consistent with Haldane mapping function. The Kosambi mapping function, which is usually the favorite one, has theoretical weaknesses and we advise not to use it.

When several datasets are merged using the dsmergen command, since the underlying assumption is that all datasets share the same parameters, a single ``consensus'' map suffices to define a probabilistic model for all the data. But such maps do not suffice for more complex models, eg. when several datasets are merged using the dsmergor command. In this case, the assumption is that all dataset merged using dsmergor share a consistent order but each with different specific distances (i.e., probabilities).

In this case, a map is defined by a collection of simple maps (as above) that share a consistent order.

Thomas Schiex 2009-10-27