Building maps from scratch

We can now start to try to build maps. All the maps built go in a specific storage structure called the ``heap'' that remembers the best maps found during all the map search process. To build a first map in order to get a first map in the heap, we will simply directly ask CarthaGene to assess the quality of the default order specified in the mrkselset command. This is done using the sem command. This procedure compute true multipoint maximum likelihood of the current order and prints the corresponding map (the markers order used for printing can be reversed w.r.t. the marker selection. See section 2.4.10).

CG> sem

Map -1 : log10-likelihood =  -169.89
-------:
 Set : Marker List ...
   1 : L029 A079 A059 A036 M232 D022 M237 M030 M076 M034 T018 T035 L078 L00...

To try to build less stupid maps, we can try to use heuristics building procedures. The two simple procedures nicemapl and nicemapd build reasonably good maps using respectively 2-points LOD and 2-points distances as guide (trying to put strong LOD/small distances close together). The two slightly more complex procedures mfmapl and mfmapd tend to provide better results. Both classes of heuristics are derived from usual travelling salesman problem heuristics. In all cases, the true multipoint maximum likelihood of the order is then computed and the map printed.

CG> nicemapl

Map -1 : log10-likelihood =   -72.96
-------:
 Set : Marker List ...
   1 : L029 L010 L078 T035 D022 A059 L001 A079 M030 M232 T018 M237 M076 A03...

CG> nicemapd

Map -1 : log10-likelihood =   -70.86
-------:
 Set : Marker List ...
   1 : L029 L010 L078 T035 D022 L001 A059 A079 M030 M232 T018 M237 M076 A03...

The loglikelihoods of the maps found using these two heuristics are -72.96 and -70.86 respectively. We can have a closer look to the maps build up to this point by asking for a detailed view of all the maps stored in ``the heap''. This is achieved using the heaprintd command:

CG> heaprintd

Map  0 : log10-likelihood =  -169.89, log-e-likelihood =  -391.19
-------:

Data Set Number  1 :

      Markers        Distance    Cumulative  Distance   Theta       2pt
Pos  Id name         Haldane     Haldane     Kosambi    (%%age)      LOD

  1  20 L029          29.9 cM     29.9 cM     24.3 cM    22.5 %%     4.4
  2 284 A079           2.2 cM     32.2 cM      2.2 cM     2.2 %%    18.4
  3 277 A059          21.6 cM     53.7 cM     18.3 cM    17.5 %%     6.2
  4 255 A036          13.0 cM     66.8 cM     11.7 cM    11.5 %%     9.0
  5 220 M232          10.1 cM     76.9 cM      9.2 cM     9.1 %%     4.8
  6 239 D022          13.9 cM     90.8 cM     12.4 cM    12.2 %%     3.2
  7 186 M237           8.7 cM     99.4 cM      8.0 cM     7.9 %%    11.0
  8 132 M030           8.7 cM    108.1 cM      8.0 cM     7.9 %%    11.0
  9  99 M076           5.9 cM    114.0 cM      5.6 cM     5.6 %%    13.0
 10  94 M034          12.2 cM    126.3 cM     11.0 cM    10.9 %%     8.6
 11  75 T018          19.3 cM    145.6 cM     16.6 cM    16.0 %%     6.5
 12  85 T035           0.0 cM    145.6 cM      0.0 cM     0.0 %%    21.4
 13  62 L078          14.5 cM    160.0 cM     12.8 cM    12.6 %%     9.0
 14  42 L001          19.4 cM    179.4 cM     16.6 cM    16.1 %%     6.0
 15  38 L010        ----------              ----------
                     179.4 cM                156.8 cM


       15 markers, log10-likelihood =  -169.89
                   log-e-likelihood =  -391.19

Map  1 : log10-likelihood =   -72.96, log-e-likelihood =  -167.99
-------:

Data Set Number  1 :

      Markers        Distance    Cumulative  Distance   Theta       2pt
Pos  Id name         Haldane     Haldane     Kosambi    (%%age)      LOD

  1  20 L029           0.0 cM      0.0 cM      0.0 cM     0.0 %%    18.1
  2  38 L010           5.9 cM      5.9 cM      5.6 cM     5.6 %%    13.1
  3  62 L078           0.0 cM      5.9 cM      0.0 cM     0.0 %%    21.4
  4  85 T035           2.8 cM      8.7 cM      2.7 cM     2.7 %%     9.6
  5 239 D022          12.7 cM     21.4 cM     11.4 cM    11.2 %%     6.4
  6 277 A059           1.1 cM     22.5 cM      1.1 cM     1.1 %%    19.9
  7  42 L001           3.4 cM     25.9 cM      3.3 cM     3.3 %%    16.8
  8 284 A079           0.0 cM     25.9 cM      0.0 cM     0.0 %%    21.7
  9 132 M030           3.4 cM     29.3 cM      3.3 cM     3.3 %%    16.0
 10 220 M232           1.1 cM     30.4 cM      1.1 cM     1.1 %%    17.8
 11  75 T018           4.7 cM     35.1 cM      4.5 cM     4.5 %%    12.8
 12 186 M237           0.0 cM     35.1 cM      0.0 cM     0.0 %%    19.9
 13  99 M076           5.9 cM     41.0 cM      5.6 cM     5.6 %%    13.0
 14 255 A036           0.0 cM     41.0 cM      0.0 cM     0.0 %%    21.4
 15  94 M034        ----------              ----------
                      41.0 cM                 38.6 cM


       15 markers, log10-likelihood =   -72.96
                   log-e-likelihood =  -167.99

Map  2 : log10-likelihood =   -70.86, log-e-likelihood =  -163.17
-------:

Data Set Number  1 :

      Markers        Distance    Cumulative  Distance   Theta       2pt
Pos  Id name         Haldane     Haldane     Kosambi    (%%age)      LOD

  1  20 L029           0.0 cM      0.0 cM      0.0 cM     0.0 %%    18.1
  2  38 L010           5.9 cM      5.9 cM      5.6 cM     5.6 %%    13.1
  3  62 L078           0.0 cM      5.9 cM      0.0 cM     0.0 %%    21.4
  4  85 T035           2.5 cM      8.5 cM      2.5 cM     2.5 %%     9.6
  5 239 D022          11.5 cM     19.9 cM     10.4 cM    10.2 %%     6.4
  6  42 L001           1.1 cM     21.0 cM      1.1 cM     1.1 %%    19.9
  7 277 A059           2.2 cM     23.3 cM      2.2 cM     2.2 %%    18.4
  8 284 A079           0.0 cM     23.3 cM      0.0 cM     0.0 %%    21.7
  9 132 M030           3.4 cM     26.7 cM      3.3 cM     3.3 %%    16.0
 10 220 M232           1.1 cM     27.8 cM      1.1 cM     1.1 %%    17.8
 11  75 T018           4.7 cM     32.5 cM      4.5 cM     4.5 %%    12.8
 12 186 M237           0.0 cM     32.5 cM      0.0 cM     0.0 %%    19.9
 13  99 M076           5.9 cM     38.4 cM      5.6 cM     5.6 %%    13.0
 14 255 A036           0.0 cM     38.4 cM      0.0 cM     0.0 %%    21.4
 15  94 M034        ----------              ----------
                      38.4 cM                 36.2 cM


       15 markers, log10-likelihood =   -70.86
                   log-e-likelihood =  -163.17

        EM calls:
           Set  1 : 36 (33,0)
        CPU Time (secs): 0.17
        Maps within -3.0: 2

So far there are only our 3 maps in the heap. We can try to build other maps using a smarter heuristics procedure called build. This command incrementally includes markers, always choosing the best loglikelihood and the best insertion point. Because it is too ``greedy'' in its choices, this procedure can be performed in parallel on several maps, always keeping the

best maps. So, the command takes one parameter

to specify the number of map built at the same time.

CG> build 10

Build(10) : |||||||||||||||

Map  5 : log10-likelihood =   -70.86
-------:
 Set : Marker List ...
   1 : L029 L010 L078 T035 D022 L001 A059 A079 M030 M232 T018 M237 M076 A03...

No better map was found. We may now shift to so-called ``improving'' methods. These methods cannot start from scratch and are dedicated at improving available maps.

Thomas Schiex 2009-10-27