WU BLAST 2.0 Memory Requirements and Usage

BLAST Memory Requirements

Memory Requirements for the Classical Ungapped BLAST Algorithm

Several characteristics of a BLAST search go into determining its heap memory requirements. As BLAST is currently implemented, memory requirements are significantly different from earlier versions of the software. Today, the memory required is often dominated by a linear function of the length of the query sequence (with a proportionality constant of at least 5 but potentially many fold larger than this), plus up to 2 times the length of the longest database sequence (to accommodate its full-length translation into protein in the TBLASTN and TBLASTX search modes). The database component should also be multiplied by the number of CPUs or threads employed.

For a rote implementation of the "classical" ungapped BLAST algorithm (Altschul et al., 1990), contributors to memory use include:

the length of the query sequence, Q, measured in residues;
the length of the longest sequence in the database, D, measured in residues;
whether one or both strands of the query sequence are searched (relevant only to BLASTN, BLASTX and TBLASTX);
whether one or both strands of the database sequences are searched (relevant only to TBLASTN and TBLASTX);
the number of processors or threads employed, C;
BLAST data structures whose total size, B, is a function of parameters that govern the algorithm's sensitivity; these parameters include the word length, W, and neighborhood word score threshold, T;
the storage required for a data pointer (P = 4 bytes on 32-bit architecture computer systems, 8 bytes on 64-bit architecture computer systems);
whether low-complexity regions or repetitive elements have been masked from the query and/or database sequences;
the number of hits that must be saved for post-processing and subsequent output;

As an example, not counting the memory required to save intermediate results and for post-processing, the storage requirement (in bytes) for a classical BLASTN search is approximately: 5SQ + C[8S(Q+D) + D] + B, where S=1 or S=2 depending on whether one or both strands of the query are searched. In this example, B will be no more than (and often much less than) P4^W. Using one processor or thread, a typical BLASTN search using both strands of the query and database will require at least 26Q + 17D + B bytes. If an additional processor or thread is used, the minimum memory requirement will increase by 16Q + 17D, for a total of 42Q + 34D + B bytes.

Clearly, memory requirements can be greatly reduced by limiting the number of threads employed (C) on multiprocessor systems. The default behavior is to use all available processors (one thread per processor) in the case of BLASTP, BLASTX, TBLASTN and TBLASTX; and up to 4 processors in the case of BLASTN. This default behavior can be altered in a local file named /etc/sysblast. (An example sysblast.sample file is provided in licensed BLAST 2.0 software distributions). In this regard, the most efficient use of computing resources will be obtained by limiting individual BLAST jobs to just a single thread, so that the computational overhead of thread creation, synchronization, and destruction is avoided.

When activated for a computer system, Intel HyperThreads typically appear like separate processors to application programs like BLAST. BLAST can spawn an additional thread of execution for each HyperThread. Use of HyperThreads often speeds up a search, but with the attendant increase in memory required for each additional thread.

Further memory savings can be had by requesting that just one strand be searched at a time, because the default behavior is to search both strands of the query and/or database. While it does not reduce heap storage requirements, processor address space can be conserved by using the -mmio option, which may then allow more of the address space to be utilized for heap memory allocation; the -mmio option is not recommended for general use, however, because memory-mapped I/O is faster than buffered I/O.

Sufficient real memory should be provided to the search programs that they can run without spilling over into virtual memory swap storage, as it can be disastrous to BLAST performance to be hitting disk.

Long queries and database sequences may exceed the limits of 32-bit virtual addressing and require 64-bit processing. On computing platforms that support 64-bit virtual addressing, WU BLAST often supports both 32-bit and 64-bit virtual addressing, in separately distributed software distributions. While 32-bit virtual addressing may at first seem out-dated and a waste of a 64-bit computer, if a given problem can be computed within 32-bit limits, less memory will be required and execution often will proceed significantly faster than it would in 64-bit mode. Even when mere 32-bit virtual addressing is required, a 64-bit computer can still offer significant advantages, in that the 64-bit system may be configurable with more memory in which to cache database files (see below), in which to utilize more threads on a given BLAST job, or in which to run more BLAST (and other) jobs simultaneously.

For more information on 32-bit versus 64-bit computing and the proper configuration of operating systems to support large memory, please see http://blast.wustl.edu/blast/KernelTuning.html.

Database File Caching

Beyond the above requirements for program storage, any additional memory available may improve BLAST performance, through caching of database files in what would otherwise be unused memory. When the same database(s) are to be searched repeatedly (e.g., by an automated analysis pipeline), caching of the database files avoids the latency of disk I/O and potential contension between different processes for the same disk resources.

Files are usually cached by the operating system in a FIFO (first in/first out) manner. If sufficient memory is only available to cache the files for a subset of the databases, then file caching will not be effective. In such a case, overall system throughput will likely improve if the job stream can be structured to search all queries against one cache-able database subset before proceeding to search the next cacheable subset, and so on until all of the required databases have been searched. In this manner, the database files can generally be cached (aside from the very first time each database is searched, which primes the cache).

How much additional memory is useful for file caching? Typical BLAST searches involve a sequential search through an entire database. Each search then requires that the entirety of the ntdb.xns or aadb.xps file be read, in addition to the associated .x[np]t file. For any database hits, the associated .x[np]d file will be read to obtain the sequence descriptions. Sufficient memory should be available to cache the .x[np]s and .x[np]t files, plus some additional memory for perhaps a few thousand database descriptions. Adding memory is unlikely to improve performance if there will still be insufficient memory with which to cache the entire .x[np]s and .x[np]t files, due to the FIFO nature of cache management.

File caching occurs automatically with contemporary operating systems, albeit with varying degrees of effectiveness. For instance, at least some versions of Linux on IA32 platforms seem to have difficulty caching "large" files (files on the order of 1 GB or larger). One should be wary of other jobs executing simultaneously with BLAST, as well, whose actions may purge the cache of BLAST database files. If other jobs besides BLAST are active on the computer system, sufficient memory should be provided for them to function within memory, as well, on top of the requirements for the BLAST algorithm and BLAST database file caching.

References

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 215:403-10.

Return to the WU BLAST Archives home page

Last updated: 2004-06-02