# Improving and validating existing maps

Map improving and validating commands all try to find a map whose loglikelihood is improved over an initial map. For all such commands, the initial map used is the best map in the heap. Beyond a possible map improvement, all these commands may affect the contents of the heap: all maps considered by these commands are candidate for insertion.

The first category of commands are simple heuristics command that slightly perturbate the initial map in a systematic manner:

• The famous flips command applies all possible permutations in a sliding window of fixed size on the initial map. The command reports all maps found whose loglikelihood lies within a given threshold of the loglikelihood of the original map. If a new improved map is found, the command may be automatically iterated. It can also be used to explore all possible orders when a small number of markers is selected by choosing a window size equal to the number of markers.
• the polish command removes one marker of the initial map and tries to insert it in all possible intervals. The variation in loglikelihood is reported for all markers and all positions in a matrix.
• the squeeze command tries to detect and remove erroneous markers such that their removal corresponds to a large reduction in the total size of the initial map.

These commands are simple and systematic. CARTHAGENE offers much more powerful commands that use a class of discrete optimization techniques called ``stochastic optimization'' or more precisely ``local search methods''. This class of methods is usually acknowledged as the most efficient class of technique for hard discrete optimisation problems such as the genetic marker ordering2.2

These optimisation methods all follow a similar principle: one or several current orders are stochastically ``perturbated'' and depending on the result of the pertubation and a random choice, the perturbated maps may replace the original maps or not. Three such methods are integrated in CARTHAGENE: simulated annealing [LA84], taboo search [Glo89,Glo90] and genetic algorithm [Gol89]. The three methods use the same perturbation mechanism: a subsection of the map is chosen and flipped 2.3. Simulated annealing and the genetic algorithm may also use more complex perturbation techniques.

Moreover, it is possible to assess the robustness of a given map by looking at the map distribution using Markov Chain Monte Carlo. For that, see the mcmc command. It works only in the comparative mapping approach.

These commands described next can take a long time to improve or not the current solution. The system does not provide a prediction of the time cost of this algorithms. Nevertheless, these methods can be interrupted online, without loss of information (see section 2.6.11).

Subsections
Thomas Schiex 2009-10-27