Comparative mapping

When a subset of the markers are associated to genes having a known orthologous relationship with the genes of a completely sequenced genome, then it is possible to take advantage of the order of the orthologous genes in the sequenced genome (reference order) for the purpose of mapping a genome of interest. The rationale for this is that closely related organisms differ by a small number of chromosomal rearrangements, in comparison to the number of genes they have in common, and have therefore long conserved chromosomal segments. The comparative mapping approach in CARTHAGENE extends its traditional maximum multipoint likelihood criterion by incorporating a non-uniform prior distribution on the possible orders. The a priori probability of an order is a function of its evolutionary distance to a given reference order. The number of breakpoints between an order and the reference order is used as a measure of the evolutionary distance. This number of breakpoints is the number of adjacent markers in an order which are not adjacent in the reference order.

CARTHAGENE proposes two approaches for maximizing this new comparative mapping criterion. The first approach replaces the traditional criterion by the comparative mapping criterion, and then the usual CARTHAGENE commands can be used to build a first map and to improve it. The second approach finds a set of maps, and then chooses the best map inside this set. See the paragraph below for an explanation. The first approach should be used if the number of selected markers is small enough (lower than one hundred) or if the number of known orthologous relationships is close to the number of selected markers and the number of missing data is limited (in this case the linear approximation of the comparative mapping criterion to the Traveling Salesperson Problem works well). Otherwise, the second approach is recommended. Both approaches are independent.

**Figure 2.1:** The comparative mapping criterion taking into account a biological dataset and the number of breakpoints to a reference order versus the traditional mapping criterion taking into account a biological dataset only.
$\includegraphics[width=0.5\linewidth]{bpcircles}$ $\includegraphics[width=0.5\linewidth]{exemple/Data/chr1mlecolor}$

In Figure 2.1 Left, all the possible marker orderings are represented by several concentric circles, each circle corresponding to the set of orders having a fixed number of breakpoints to the reference order. Intuitively, the a priori probability of finding the true order having a particular number of breakpoints is reversely proportional to the number of possible orders at this number of breakpoints. For a fixed number of breakpoints, the optimal order is the one which has the maximum multipoint likelihood taken into account the biological dataset only. In order to find the best order given a biological dataset and a reference order, the comparative mapping approach first search for all the optimal orders at every number of breakpoints. These orders are represented in the figure by a set of points on a straight line segment delimited by the reference order (zero breakpoint) and the order which maximizes the traditional mapping criterion (here, five breakpoints). This set of optimal orders defines the so-called Pareto frontier^2.4. The order which maximizes the criterion of the comparative mapping approach is chosen among the Pareto frontier. In Figure 2.1 Right, the two criteria for the different orders in the Pareto frontier are shown. In this example, taken from mouse chromosome 1 using the human genome as a reference order, the best map has 7 breakpoints compared to the 41 breakpoints found by the traditional approach based on a biological dataset only. For this example, the true map has 6 breakpoints.