Comprehensive mapping

The first simplest command to produce a map is the sem command. SEM is a shortcut for ``Single EM'': it will produce a map using the current order used to specify the list of active markers (specified using the mrkselset command). The map parameters are automatically optimized by the EM algorithm (hence the name of the command). Naturally, unless you have some good idea of the markers ordering, this command has no reason to produce a likely map.

The next level of sophistication is provided by the nicemapl, nicemapd, mfmapl and mfmapd commands that exploit 2-points measures such as 2-points LOD and 2-points distances. A strong limitation of such commands is that often, in multiple population mapping, some 2-points measures may be undefined. this is the case eg. when a marker is typed in one population and not in another. In such case, it is likely that such commands will produce poor maps. The heuristics used to build maps is the so-called ``nearest neighbor'' heuristics. It consists in adding as a new marker the marker which is the closer (resp. more strongly linked) to the previous marker. The heuristics is called repeatedly starting from all possible markers and the best map produced (w.r.t. loglikelihood) is reported. There is another kind of heuristic based on the powerful Keld Helsgaun's LKH software. LKH implements the Lin-Kernighan heuristic for solving Traveling Salesman Problems. See commands lkh, lkhn, lkhl, lkhd, lkhocb and lkhocbn. It is also possible to use another TSP solver by using the cg2tsp command in order to export the marker ordering problem into a TSP format. The commands fconcorde and flkh use external system calls (only on Linux computers) to William Cook's Concorde TSP exact solver and LKH respectively.

The next level of sophistication is provided by the build function, inspired by similar facilities in CRIMAP [Gre88]. This command starts by examining all pairs of markers in the current list of active markers and keeps the pair that has a maximum likelihood. Then a new marker is selected: it is the markers which is the most strogly linked with markers already in the map. It is tentatively inserted in all possible positions and the best position is kept. Because this process is still far from always yielding good maps, it is typically performed on several distinct maps in parallel (the maps are the $k$ best maps found at each step instead of just the best one). The parameter $k$ is specified as a numerical argument to the build command. This command is largely superseded by the framework mapping command buildfw which can also perform comprehensive mapping.

Thomas Schiex 2009-10-27