Road map to BEAP

Overview

Since many options were inherently flexible in BEAP, we wanted to test a number of scenarios that could alter BEAP performance. To determine how BEAP output was changed when BLAST settings were altered, we varied the E-value stringency, word size and number of databases. To determine how different sequence attributes would affect BEAP perform-ance, we compared output generated when using intronic, exonic, exonic and untranslated regions (UTR), and varying levels of repetitive sequence elements template sequence. To determine how template sequence size altered BEAP performance, we tested individual template sequence sizes, the number of sequences used as template, and the total amount of se-quence used as template. Last, we investigated the difference in BEAP output when using local (megaBLAST) vs. network BLAST query. All trial runs of BEAP used the same settings, with the exception of the factor tested. The same template sequence was used for testing in each case, with the exception of tests looking at repetitive sequence content. Tests using increased sequence sizes used the same sequence used in the other trials with additional contiguous sequence when larger sequences were tested. Default test settings used network BLAST to query six bovine databases at NCBI, include: BAC end sequence, trace expressed sequence tagged sites, trace other sequence, trace whole genome sequence, high throughput genome sequence, and EST databases. The E-value was set to e-30 when differences in E-value were not being queried. The database queries using megaBLAST were always the same, and included the Bos taurus unique sequence and Bos taurus tiger gene databases. In some cases, following initial tests, the E-value was adjusted as needed to attempt to force BEAP to run when no result could be obtained. These changes are noted and reported only when the revised method was success-ful and the initial test failed in creating contigs.

Template Sequence

The user must define the appropriate template sequence and species. The template sequence is used much like primers in PCR for BLAST to query the species of interest. Cross-species comparative maps (i.e. RH maps) can be used to identify syntenic sequence blocks between species to find a suitable template sequence.

Retrieval of template sequence

In our test case, we defined the "best" template sequence as that obtained from the Human-Bovine RH map. Bovine genetic markers were used to find the human syntenic gene block that corresponded to the bovine chro-mosomal block of interest. Genes and pseudo genes were deemed template sequence, as the conservation between species is greatest in genic regions. Human gene sequences were used as the "template" to identify the ortholo-gous genes in cattle. All template sequences were retrieved from UCSC "golden path" genome browser at http://genome.ucsc.edu/ and the ENSEMBL database at http://www.enembl.org/. Repetitive sequence elements are often a hindrance to genomic assembly programs because they occur at multiple sites across the genome with wide variation in flanking genomic sequence. Repetitive sequences were masked using Repeat Masker software in the template (i.e. human sequence) prior to use of BEAP. This was easily facilitated using the repeat masking feature when querying template sequence using the table browser at the UCSC website. Cattle (Bos taurus) sequence was used in all tests of BEAP performance. The sequences obtained in the application to the bovine dwarfism locus used the sequence databases available in 2005, prior to full assembly. The BEAP performance trials utilized the whole bovine genome sequence, version 3, from 2007.

Use Case: An Application

Use BEAP to construct contigs within the Angus dwarfism locus

Fine-mapping of the Angus dwarfism locus resulted in a critical region of roughly 1-2 Mbps on Bos taurus autosome 6. Since the bovine genome was not fully sequenced upon the first application of BEAP, many of the candidate genes in this genomic region were unknown and unanno-tated. The Human-Bovine RH map was used to define the template se-quence allowing for some extra sequence proximal and distal to the ho-mologous bovine chromosome block. Homo sapien autosome 4 genomic DNA sequence from 78,000,000 to 83,000,000 base pairs was defined as the "template" for BEAP assembly of the bovine. This genomic block contained 20 genes and pseudo-genes. The template sequence used by BEAP included both exonic and UTR sequences. We used RH markers within genes in both the bovine and human genome builds to anchor one map to the other. An E-value of e-30 was used for all tests. Databases tested were the same six listed above as default for the network BLAST tests.
(Return to the main page)
[an error occurred while processing this directive]