European Molecular
Biology Computing Network - Biocomputing Tutorials OMIM" Asking

OMIM: Asking Simple Questions


Table of Contents

Searching OMIM


Searching OMIM

OMIM was originally a book (Mendelian Inheritance in Man) of short review articles on genes and genetically based traits of medical importance. Most of its information is text, so that relevant articles can be retrieved from the on-line version using simple searches for keywords. We'll begin our exploration of OMIM by looking for the human gene(s) coding for the enzyme "alpha glycerol phosphate dehydrogenase", or "alpha GPDH".

Exercise OMIM - Asking Simple Questions 1a: use keywords to find an article
In the frame below (or in the other web browser) there is a text entry box beside
"
Enter one or more search keywords:". Enter the two keywords: alpha GPDH.
Click on the "
 Submit Search " button and wait for the result.

  • OMIM finds NO documents; this seems wrong! The gene coding for the enzyme I know as "alpha GPDH" has been cloned and sequenced in many organisms. Those of us lacking it are probably dead. It must be there, but perhaps with a slightly different name.
    Click on the " Search " button to go back and try again. This time try a search with only alpha as the keyword.

  • Now OMIM finds 1218 documents, showing titles for the first 50. A quick scan down the list reveals that "alpha GPDH" - or anything similarly named - isn't among them. It must be farther down the list.

    This pair of results shows an important feature of the OMIM "search engine": when a single keyword gives many results but two keywords give none, the implicit connector between keywords is "AND". OMIM only reports documents having all keywords present. This makes it easy to be precise with a query, but also to be over-precise, so no results are returned!

    With this knowledge, we have two options for the next search.

    1. Get ALL 1218 documents retrieved with alpha, and then scan through this list for "GPD" or "phosphate" or "glycerol". This option could be quite time-consuming for us.
    2. Get fewer documents by using another keyword or two together with alpha, and hope that the documents for "alpha GPDH" are in the first 50. This option is more likely to be time-consuming for the computer!

    Exercise OMIM - Asking Simple Questions 1b: use more keywords to refine a query
    Click on the " Search " button to try again, using alpha phosphate as the keywords.

  • OMIM finds 60 documents; better, but still too many to browse until we find what we want.

    Another method to refine a query is to restrict the search to only one or two of the many sections that make up OMIM documents. These sections, called fields, hold specific types of information: e.g., bibliographic data, known alleles, clinical synopses, etc. (NB: 7 to 14 fields of information are possible in an OMIM document. There is more on all the available fields in Part 2 - OMIM Record Format.)

    Exercise OMIM - Asking Simple Questions 2: use field restriction to refine a query
    Try again, using alpha phosphate as the keywords, and clicking in the boxes next to
    " Title: " & " Text: " . These boxes are just below the keyword text entry box.

  • OMIM finds 36 documents; still many irrelevant, but almost managable for a quick scan to find the ones we are after.
    As a final attempt, forget "alpha" and search for the other keywords glycerol phosphate dehydrogenase, only in the title field.

  • Congratulations! OMIM returns 2 documents, both relevant. A few rounds of progressive query refinement and juggling possible keywords has brought us to our goal.

    The two documents are titled:

      *138420 GLYCEROL-3-PHOSPHATE DEHYDROGENASE-1; GPD-1
      *138430 GLYCEROL-3-PHOSPHATE DEHYDROGENASE-2; GPD-2

    Could we have found them - instead of by progressive query refinement - by broadening our first query (alpha GPDH, in all fields) that failed? Notice that if we had used only GPD as the second keyword, this first query would have succeeded. When uncertain of part of a keyword, OMIM permits use of the "*" wildcard, a special character that will match any other(s), including no character.

    Exercise OMIM - Asking Simple Questions 3: use wildcards to broaden a query
    Try the first query again, using alpha GPD* as the keywords.

  • OMIM finds two documents, the second of which is relevant.

    I mentioned earlier that the implicit connector between keywords used in a query was "AND". OMIM allows other connectors as well ("OR" & "AND NOT"), plus the use of parentheses to form complicated query expressions. For example,

    Query ExpressionSearches for documents matching ...
    cancer & oncogene BOTH cancer and oncogene
    cancer & oncogen* BOTH cancer and at least one of oncogene, oncogenic, etc.
    cancer | oncogene EITHER cancer OR oncogene (OR BOTH)
    cancer - oncogen* cancer BUT NONE of oncogene, oncogenic, etc.
    cancer - oncogen* | renal EITHER (cancer BUT NONE of oncogene, oncogenic, etc.) OR renal (OR BOTH)
    cancer - (oncogene | renal) cancer BUT NEITHER oncogene NOR renal

    Exercise OMIM - Asking Simple Questions supplement: On you own ...
    How many documents are retrieved with:
    alpha & dehydrogenase & (glycerol | phosphate)
    glycer* & (dehydrogenase | phosphate)
    GPD* - blood

    When the OMIM "search engine" finds more than one document for a particular query, the list of results is ranked by the order of keyword count. Documents with more instances of the keywords appear higher in the list. While this is effective in most cases, it sometimes acts to promote longer, irrelevant documents over pertinent short ones. Further, the effect is compounded as more and more keywords are used. One quick and crude solution is to omit the longest section of most OMIM documents - the Text: field - from the search. But this has the drawback that many valid responses may be filtered out completely.

    A more refined strategy is to specify the field(s) to be searched for each keyword in the query. For example, when looking for reviews of renal cancer citing Bert Vogelstein's work, a possible query is:

    renal[text] & cancer[text] & Vogelstein[reference]

    Alternately, you may click the Text: field restriction box to limit the search for renal & cancer, and still specify only the reference field for Vogelstein.
    renal & cancer & Vogelstein[reference]

    Exercise OMIM - Asking Simple Questions 4: restricting searches for particular keywords
    Search for "glycer* & phosphate" in all fields, with either "alpha" or "dehydrogenase" only in the title field.
    glycer* & phosphate & (alpha[title] | dehydrogenase[title])

  • OMIM finds 7 documents.
    Search only the Text: and References: fields for "glycer* & phosphate", keeping the search for "alpha" or "dehydrogenase" limited to the title field.
    glycer* & phosphate & (alpha[title] | dehydrogenase[title])
       (clicking in both the Text: and References: boxes)

  • OMIM finds the same 7 documents. Notice that if you had tried to restrict the fields searched for the keywords at the level of each keyword, as in
    glycer*[text] glycer*[reference] phosphate[text] phosphate[reference] & (alpha[title] | dehydrogenase[title])

    the query is different. The implicit connector between keywords is "AND", while the clickable boxes force the use of "OR" between the same keyword sought in multiple fields. The equivalent query, using field restriction at the keyword level is the rather complicated

    (glycer*[text] | glycer*[reference]) & (phosphate[text] | phosphate[reference]) & (alpha[title] | dehydrogenase[title])

    Take home message - use the clickable boxes whenever possible!
    As a final exercise, search the Text: and References: fields for "glycer* & phosphate", further requiring the presence of "dehydrogenase" but the absence of "alpha" in the title field.
    glycer* phosphate dehydrogenase[title] - alpha[title]
       (clicking in both the Text: and References: boxes)

  • OMIM finds the same 7 documents again. Notice that keyword level field restriction fails when applied to a query expression (or sub-expression). The query
    glycer* phosphate (dehydrogenase - alpha)[title]

    doesn't work the way we want it to!

    To summarise, finding relevant documents from OMIM means composing queries for OMIM's search engine. A query may be as simple as a keyword or two, or a complicated expression, using tricks to filter out responses (i.e., search field restriction, keyword connectors, & parentheses), and a trick to include more responses (i.e., the "*" wildcard character). One of the wonderful aspects of the OMIM search engine (and of computers in general) is that it is tolerant of mistakes; once it has made your error come true for you, it always waits for you to try again!

    Exercise OMIM - Asking Simple Questions 5: practicing query expressions
    Create query expressions to answer the following:
    • How many documents have the words "retinitis pigmentosa" in their titles?
    • How many documents refer to work by Bert Vogelstein? How many documents also about cancer refer to his research? How many documents about cancer do NOT reference his papers?
    • What are the types of muscular dystrophy, other than Duchene?
    • Does the context in which one sneezes have a genetic component?
    • How many documents refer to chromosomal abberations (inversions, translocations, heterochromatinisation, ...)?

  • Table of Contents Please continue with Part 2 - OMIM Document Format   OMIM Document Format

    Comments? Questions? Accolades?
    Please send them to David Featherston   ( dwf@biobase.dk )
    Updated on Tuesday, 19 November, 1996
    Copyright © 1995-1996 by David W. Featherston