Building a Pareto frontier from scratch

The first step is to combine a biological dataset and a reference order dataset using the merging operator dsmergor. The second step is to build an initial Pareto frontier using the paretolkh or paretolkhn command. The resulting frontier is stored into the CARTHAGENE heap the size of which has been resized to the number of selected markers2.5. The previous content of the heap has no effect in the mapping process and was first discarded. The process of building a Pareto frontier is based on an efficient transformation from the comparative mapping criterion to a bi-objective Traveling Salesman Problem. The resulting frontier is only an approximation of the exact frontier. The quality of this approximation depends on the number of missing data and the number of orthologous markers2.6. The best configuration corresponds to the case of no missing data and all markers belonging to the reference order dataset. Even in this case, the resulting frontier is still an approximation due to the bi-objective solving technique used in CARTHAGENE. Thus, it is still possible to improve the frontier, see Section 2.7.7.

For efficiency reasons, CARTHAGENE is using the Keld Helsgaun's LKH software (non GPL code). Currently, the implemented lkh command works only with backcross, haploid RH, or a combination of both data types (dsmergor)2.7.

It is possible to speed up the paretolkh or paretolkhn commands by changing the way CARTHAGENE evaluates a map before it is inserted into the heap. See the cg2pt command. Moreover, the paretolkh or paretolkhn commands are using a temporal heap during their process the size of which is arbitrarily fixed to 1024 (all the maps found at each step of the process are stored into this temporal heap before being evaluated in order to update the Pareto frontier). This size can be increased if the number of orthologous markers is large (see the paretoheapsize command).

Thomas Schiex 2009-10-27