WU BLAST 2.0 Parameters

Command Line Options

The complete list of command line options supported by WU-BLAST 2.0 is provided in the tables below. The information presented here comprises their definitive description. This information should be considered valid only for the current (most recent) version of the software. If you find an inconsistency between the advertised behavior and the actual behavior of the software, first be sure you are using the latest version, as indicated by the date of the latest release shown at http://blast.wustl.edu. If the inconsistency persists after upgrading, please report it to If you wish to continue using an older version of the software instead of upgrading, please consult the copy of parameters.html that came bundled with that version; it may be more accurate for your purposes than the on-line documentation. For most of the options, a logical diagram indicates where each has its effect.

When this web page can not be conveniently accessed, terse descriptions of most items may be obtained by entering the relevant BLAST program name on the command line without any arguments. A copy of this parameters.html web page is bundled with the licensed software, as well. Where differences arise between the bundled file and the on-line version, they may be due to differences in the software at the time of release. The most recent version of the page you are viewing is located here. A PDF version of the page is available here.

Command line options for the obsolete WU- and NCBI-BLAST version 1.4, first released in 1994, often apply unchanged to WU-BLAST 2.0, which yields a high degree of upward compatibility. While BLAST 1.4 is many years old now, if you are interested in it, e.g., for reasons of reproducing prior results, please see the BLAST 1.4 manual page in PDF format.

Command Line Syntax

Aside from the first two command line arguments (database name and query filename), which are required items, the WU-BLAST search programs support a flexible syntax for command line options and parameters. Parsing of the command line is generally alphabetical case-independent. A leading hyphen (-) is unnecessary on option names but may improve its human readability. Parameter values can optionally be specified using an equals sign (=). Combined use of hyphens and equals signs is allowed and does not need to be consistently applied throughout a given command line. Large integer values can be specified using floating point representation (e.g., 1e9 instead of 1000000000). For parameters with single letter names, neither a hyphen nor an equals sign is necessary.

The basic command line syntax is:

	<program> <database> <query> [options...]

where <program> is one of blastp, blastn, blastx, tblastn and tblastx; <database> is the name of the database to search (previously formatted with xdformat); <query> is the name of a file containing one or more query sequences in FASTA format; and [options...] is a list of zero or more command line options and parameter settings.

As examples of the command line flexibility available, each of the following command lines are valid and equivalent:

      blastp nr myquery.aa  v=10  b=100  filter=seg    e=1e-10    nogaps
      blastp nr myquery.aa  V=10  B=100  filter=seg    E=1e-10    nogaps
      blastp nr myquery.aa -V=10 -B=100 -filter=seg   -E=1e-10   -nogaps
      blastp nr myquery.aa -V10  -B100  -filter seg   -E1e-10    -nogaps
      blastp nr myquery.aa -V10  -B100  -filter "seg" -E1e-10    -nogaps
      blastp nr myquery.aa -V 10 -B 100 -filter seg   -E 1e-10   -nogaps
      blastp nr myquery.aa  V 10  B 100  filter seg    E 1e-10    nogaps
      blastp nr myquery.aa  -v10  B=100  FILTER=seg   -e=1e-10   -nogaps

Table of Options

altscore E gapW lcfilter nwstart qtype stats
B E2 gapX lcmask O R sump
bottom echofilter getenv links olfraction restest T
C endgetenv gi M olmax S top
cdb endputenv globalexit maskextra pingpong S2 topcomboE
compat1.3 errors golfraction matrix poissonp seqtest topcomboN
compat1.4 evalues golmax mformat postsw shortqueryok ucdb
consistency filter gspmax mmio progress soffset V
cpus gapall H msgstyle prune sort_by_count W
ctxfactor gapdecayrate haltonfatal N putenv sort_by_highscore warnings
dbbottom gapE hitdist nogaps pvalues sort_by_pvalue wink
dbchunks gapE2 hspmax nonnegok Q sort_by_subjectlength wordmask
dbgcode gapH hspsepQmax nosegs qframe sort_by_totalscore wstrict
dbrecmax gapK hspsepSmax noseqs qoffset span X
dbrecmin gapL K notes qrecmax span1 xmlcompact
dbslice gaps kap novalidctxok qrecmin span2 Y
dbtop gapS2 L nwlen qres spoutmax Z

Table of Options with Descriptions

Option Description
  altscore="score_spec" alter individual scores or entire rows or columns of scores in a scoring matrix, without editing the scoring matrix file itself. Score_spec is a quoted character string consisting of three components, each separated by white space: (1) a letter in the query sequence alphabet; (2) a letter in the subject sequence alphabet; (3) the new pairwise score to be assigned to the alignment of these two letters. If the query (subject) letter is specified as the special word any, the altered score will be assigned to the entire column (row) of the scoring matrix. If the indicated score is the special word min (max), the new assigned score will be the minimum (maximum) score observed in the matrix. If the score is given as na, the alignment of the indicated letters will be not allowed, effectively assigning to them an infinite negative score. Multiple altscore options can be specified on a given command line. As an example of the option's use, to assign an alignment score of zero (0) to the presence of a stop codon in either the query or database sequence, these two specifications can be used together: altscore="* any 0" altscore="any * 0".
See also: matrix, M and N.
  B=<b> set the maximum number of database sequences for which any alignments will be reported to b. The default limit is 250. The maximum number of alignments that may be saved and reported per database sequence is governed by other parameters.
See also: V, hspmax, gspmax, spoutmax and noseqs.
  bottom used to restrict the search of a nucleotide sequence to the bottom (-) strand. In the TBLASTX search mode, where both query and subject are nucleotide sequences, the bottom option only affects the query sequence.
See also: top, dbtop, dbbottom and qframe.
  C=<gcid> use the indicated genetic code to translate the query sequence in the BLASTX and TBLASTX search modes. gcid is a numerical identifier for the desired code. A list of the genetic codes and their identifiers is displayed if C=list is specified on an otherwise syntactically correct command line. (Example:  blastx foo foo c=list). In the TBLASTN search mode, the C parameter can be substituted for the dbgcode parameter.
The available genetic codes are:
   1. Standard*
   2. Vertebrate Mitochondrial
   3. Yeast Mitochondrial
   4. Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial;
        Mycoplasma; Spiroplasma
   5. Invertebrate Mitochondrial
   6. Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear
   9. Echinoderm Mitochondrial
  10. Euplotid Nuclear
  11. Bacterial and Plant Plastid
  12. Alternative Yeast Nuclear
  13. Ascidian Mitochondrial
  14. Flatworm Mitochondrial
  15. Blepharisma Macronuclear
  16. Chlorophycean Mitochondrial
  21. Trematode Mitochondrial
  22. Scenedesmus obliquus mitochondrial
  23. Thraustochytrium mitochondrial
  1001. Codon2004
*The default genetic code (1).
Specify the desired genetic code by its number.
The Codon2004 code provides preliminary support for a draft alphabet for working precisely with each of the 64 possible codons, rather than mapping the codons to the usual 20 common amino acids. Scoring matrix files to use the Codon2004 alphabet with a translated query sequence in BLASTX should be placed in a subdirectory named ca, located parallel to the usual aa and nt subdirectories of the matrix directory. For use in TBLASTN searches, the scoring matrix should reside in an ac subdirectory; and for TBLASTX searches, the subdirectory should be cc. (Notice the use of the letter “c” for the codon alphabet, the letter “a” for the amino acid alphabet, and the query-subject ordering of the two letters to create the subdirectory name). For “codon-ized” scoring matrices derived from the BLOCKS database and appropriate for use “as is” with TBLASTX, please go here. For more information about the Codon2004 alphabet, please see this.
See also: dbgcode.
  cdb force nucleotide sequence databases to be searched in their compressed form. This option is only effective in the BLASTN search mode for word lengths ≥ 7. Users should generally avoid specifying this option themselves, letting the software decide when to employ this search strategy.
See also: ucdb.
  compat1.3 perform a BLAST version 1.3-style search (no gaps and significance estimated using Poisson statistics), but with bug fixes, performance enhancements and new options available.
See also: compat1.4.
  compat1.4 perform a BLAST version 1.4-style search (no gaps in the alignments), but with bug fixes, performance enhancements and new options available.
See also: compat1.3.
  consistency turn off the determination of “consistent” sets of HSPs, effectively lumping all HSPs found for a given database sequence into one set. Use of this option also disables a combinatorial adjustment that is otherwise made to the Sum and Poisson statistics to account for the consistent arrangement of the HSPs out of all possible relative arrangements. This option has no effect if Sum or Poisson statistics are not being used.
  cpus=<n> request that n processors or threads be employed for the search. The default behavior is to employ as many threads as there are processors in the computer system (to a maximum of 4 threads for BLASTN searches). This default may be altered by setting a specific value for cpus in a system-wide file named /etc/sysblast; see the sysblast.sample example file included in WU BLAST 2.0 software distributions for further information. NOTE: Memory consumption increases linearly with the number of threads; the actual number of threads employed may be automatically reduced by the software if memory resources are seen to be limiting.
  ctxfactor=<c> set the “context factor” that is used as a Bonferroni correction in the statistics to c, to account for the number of contexts searched. Each distinct reading frame-to-reading frame or strand-to-strand combination between query and subject sequences constitutes one “context”. Thus, one context exists in a BLASTP search, as many as two contexts (because of the two distinct strand combinations) exist in a BLASTN search, up to 6 contexts (one for each reading frame) exist in a BLASTX or TBLASTN search, and up to 6x6 = 36 contexts exist in a TBLASTX search. The maximum default value for ctxfactor then is 1 for BLASTP, 2 for BLASTN, 6 for BLASTX and TBLASTN, and 36 for TBLASTX. Restricting a search to a single strand of the query and/or database reduces the number of contexts accordingly for that search. More accurately, however, the contribution of any given context to the default value for ctxfactor is the fraction of residues in the query (or reading frame of the query) that are unambiguous (up to a maximum value of 1.0). (N.B. this fraction is computed after any optional filtering has been applied to the query). The default ctxfactor is then merely the sum of these fractions for every context involved in the search.
The software should normally be allowed to set the value of this parameter itself, unless the user has a compelling reason to change it. One rationale for explicitly setting a value for ctxfactor might be to ensure a constant value is used in the statistics across multiple searches, where the results from the searches need to be examined and compared for their statistical significance on an common basis.
  dbbottom used to restrict the search to the bottom (-) strand of all database sequences.
See also: dbtop, top, bottom and qframe.
  dbchunks=<nchunks> establishes the granularity of the database, as it is divided into slices for assignment to individual threads, to make more efficient use of all CPUs when multiple CPUs are employed for a given search. Higher values are appropriate when the database contains relatively few sequences and/or when the sequences vary greatly in length, composition or content (e.g., genomic contigs). Lower values are appropriate when the database contains many sequences of comparable length (e.g., the EST division of GenBank). The minimum assignable value is the number of threads employed, but this setting is ill-advised; the optimal value for any given search type is likely to be a large multiple of the number of threads employed (although it need not be an exact multiple). When searching mammalian genomic contigs, a good value may be 1000. The default value is 500.
Users generally need not be concerned with this parameter.
  dbgcode=<gcid> use the indicated genetic code to translate database sequences in the TBLASTN and TBLASTX search modes. gcid is a numerical identifier for the desired code. A list of the genetic codes and their identifers is displayed if dbgcode=list is specified on an otherwise syntactically correct command line. (Example:  tblastn foo foo dbgcode=list).
See also: C.
  dbrecmax=<last_record> search the database until last_record, where database records are numbered starting with 1. By default, databases are searched completely. If last_record is greater than the actual number of records in the database, the database is simply searched until its end. It is an error for the requested last_record to be less than the first record requested to be searched in the database. Records in virtual databases are numbered with respect to the entire virtual database.
See also: dbrecmin.
  dbrecmin=<first_record> search the database beginning at first_record, where database records are numbered starting with 1. By default, databases are searched completely. It is an error for the requested first_record to be greater than the last record requested to be searched in the database (re: the dbrecmax parameter) or to point beyond the end of the database. Records in virtual databases are numbered with respect to the entire virtual database.
See also: dbrecmax.
  dbslice=m/n
  dbslice=a-b/n
at run time, for expressions of the form m/n, logically divide the database into n equivalent-sized slices and search only the mth slice, where 1 ≤ m ≤ n ≤ 100000. Alternatively, for expressions of the form a-b/n, search slices a through b (inclusive), where 1 ≤ a ≤ b ≤ n. Slice size is determined solely by the number of sequence records contained within and is not a function of sequence length. This can produce significant disparities in the workload associated with different slices, which may be alleviated by randomizing the order of sequences in the database before formatting for BLAST. In distributed computing environments, when the same, large database is to be searched repeatedly, overall throughput will likely benefit from consistently assigning the same slice(s) to the same client nodes for each search; improved efficiency results from the file caching activity that is typically performed by operating systems when the database files are first read from disk or over a network. Logically breaking the database into slices at run time means that each client node need only have sufficient unused memory as to be able to cache its assigned slice(s), not the entire database, and that the database need not be physically divided and reformatted into many smaller sub-databases whenever the number of available client nodes changes.
  dbtop used to restrict the search to the top (+) strand of all database sequences.
See also: dbbottom, top, bottom and qframe.
  E=<e> set the expectation threshold for reporting database hits to e. A database sequence will only be reported if an ascribed E-value for at least one of its alignments (or groups of alignments) is ≤ E. Lower E-values are more significant (less likely to occur by chance). The default threshold is E=10, such that if the search algorithm exhibited 100% sensitivity and the statistics applied perfectly to the sequences being studied, results involving 10 database sequences would be reported merely by chance.
See also: S.
  E2=<e> set the expectation threshold for saving ungapped HSPs to e. In the initial, ungapped alignment phase of a search, individual HSPs will only be saved for further use if their score is ≥ S2, where the default value of S2 is computed from E2. The default value for E2 varies between BLAST search modes; the resultant value for S2 will depend on the scoring system, as well. If both E2 and S2 are specified on the command line, the one corresponding to the more restrictive (higher) score threshold will be used.
See also: gapE2, S2 and gapS2.
  echofilter display the query sequence in the BLAST report, after all hard masks have been applied.
See also: filter and lcfilter.
  endgetenv ignore any subsequent getenv options found on the command line during left-to-right parsing.
See also: endputenv, getenv and putenv.
  endputenv for security in WWW server installations, where the command line may sometimes be left open to users, ignore any subsequent putenv options found on the command line during left-to-right parsing.
See also: endgetenv, getenv and putenv.
  errors suppress all ERROR messages. These messages should rarely, if ever, arise and indicate severe conditions (typically internal software bugs) that should be given immediate attention. When they do arise, parsers may break. If any ERRORs arise with this option, the number SUPPRESSED will be reported at the end of the search.
  evalues report E-values (expectations) instead of P-values (probabilities) in the initial one-line descriptions section of output.
See also: pvalues.
  filter=<filter> “hard mask” the query sequence using the specified filter. The filter program may alter the sequence in composition but not in length. For protein-level searches (BLASTP, BLASTX, TBLASTN and TBLASTX), the supported filter programs include: seg and xnu. For nucleotide-level (BLASTN) searches, supported filter programs include: dust and seg. If multiple filter specifications are made on the command line, their results are logically OR-ed.
filter=none causes any earlier specifications (to the left) on the command line to be ignored.
NOTE: By default, no filtering is performed.
Arbitrary user-defined filter programs can be utilized, if their input and output are sequences in FASTA/Pearson format and if input/output are tied to stdin/stdout.
The location of filter programs is governed by the BLASTFILTER environment variable, which can be set to a colon-delimited list of directories that the BLAST programs will successively examine to find filters.
See also: wordmask, lcfilter, lcmask and echofilter.
  gapall effectively generate a gapped alignment for every ungapped HSP found (up to hspmax). This is the default behavior.
See also: gapE.
  gapdecayrate=<r> define r to be the common ratio of the terms in a geometric progression used in altering probabilities as a function of the number of Poisson events involved (typically the number of “consistent” HSPs in a set), according to a method suggested by Phil Green. An initial Poisson probability for n HSPs is weighted by the quantity Tn, which is itself the reciprocal of the nth term in the progression tn = (1-r)rn-1. The default value for r is 0.5, such that the default weights are successively T1=2, T2=4, T3=8, T4=16, and so on. These weights provide a conservative Bonferroni correction to the probabilities, in case multiple trials are performed in determining which set of HSPs yields the lowest P-value for a given database sequence. That the geometric progression contains an infinite number of terms allows it to satisfy the need for any number of tests (and weights), when this number is unknown prior to the search.
  gapE=<gapE> generate gapped alignments for all HSPs between sequences whose expected frequency of chance occurrence is ≤ gapE. Default value is gapE=infinityi.e., gapall is in effect.
See also: gapall,
  gapE2=<e> set the E-value for saving gapped HSPs to e. In the secondary, gapped alignment phase of a search, individual gapped HSPs will only be saved for further use if their score is ≥ gapS2, where the default gapS2 is computed from gapE2. The default value for gapE2 varies between BLAST search modes; the resultant gapS2 will depend on the scoring system, as well. If both gapE2 and gapS2 are specified on the command line, the one corresponding to the more restrictive (higher) score threshold will be used.
See also: gapS2, E2 and S2.
  gapH=<h> set the value of the relative entropy, H, used in evaluating the statistical significance of gapped alignment scores.
See also H.
  gapK=<k> set the value of the extreme value statistics K parameter (Karlin and Altschul, 1990) used in evaluating the significance of gapped alignment scores. Useful when precomputed values are unavailable in the internal tables for the chosen scoring matrix and gap penalty combination.
See also K.
  gapL=<lambda> use lambda for the value of the λ parameter in the extreme value statistics used to evaluate the significance of gapped alignment scores (Altschul and Gish, 1996). Useful when precomputed values are unavailable in the internal tables for the chosen scoring matrix and gap penalty combination.
See also: L.
  gaps produce gapped alignments (the default behavior), negating the effect of any previously specified nogaps option.
See also: nogaps and gapall.
  gapS2=<s> set the score threshold for saving gapped HSPs to s. In the secondary, gapped alignment phase of a search, individual gapped HSPs will only be saved for further use if their score is ≥gapS2. The default score threshold is computed from gapE2 and will depend on the scoring system. If both E2 and S2 are specified on the command line, the one corresponding to the more restrictive (higher) score threshold will be used.
See also: gapE2, E2 and S2.
  gapW=<gapW> set the window width (or band width) within which gapped alignments are computed by dynamic programming (default is gapW=32 for protein comparisons, gapW=16 for BLASTN). Note: gapW is the full bandwidth, not the half-width.
  gapX=<x> set the drop-off score for gapped alignment extensions to x. Gapped extension of ungapped HSPs found between query and subject sequences continues until the cumulative alignment score deteriorates from the maximum value seen thusfar by a quantity gapX or more. The default value for gapX is the score associated with 10 bits of significance (2-10 < 10-3 probability) for protein-level searches or 20 bits of significance (2-20 < 10-6 probability) for nucleotide-level (BLASTN) searches. Higher values for gapX will increase sensitivity at the expense of run time.
See also: X and gapW.
  getenv="NAME" display the value of the environment variable named NAME. This may be useful for verifying that the settings of environment variables on a web server or in an analysis pipeline have been propagated all the way to the BLAST search program.
See also: endgetenv, putenv and endputenv.
  gi report NCBI “gi” (GenInfo) identifiers for sequences, when present in sequence definition lines. Normally these identifiers are suppressed from output, but they represent one of the best, stable identifiers available for the GenBank/EMBL/DDBJ databases (with ACCESSION.VERSION being the other stable identifier).
  globalexit when processing a file containing multiple query sequences, if any of them encounters a FATAL error, then after all queries have been processed, append the line "EXIT CODE 12" to the output and provide a testable exit status 12; if the exit status is 0 or if the last line of output is not "EXIT STATUS 12", then it can be assumed that all queries succeeded. To determine whether all queries succeeded without this option, the output would need to be scanned for instances of EXIT CODE with a non-zero argument. With the globalexit option, scanning of the output is only necessary when one wishes to identify the specific query (or queries) that failed and what the individual reason codes were.
See also: haltonfatal.
  golfraction=<g> maximum fractional length of overlap, g, of two gapped alignments for them to be considered independent and mutually “consistent” and their joint (Sum or Poisson) probability to be computed. The default value is 0.125 (maximum 12.5% of the length from either end of either HSP). For any given pair of HSPs, the more restrictive of golfraction and golmax is used.
See also: golmax, olfraction, and olmax.
  golmax=<len> set the maximum permitted length of overlap (in residues), len, of two gapped alignments for their joint (Sum or Poisson) probability to be computed. The default is unlimited length, with the maximum extent of overlap being governed only by the golfraction parameter.
See also: golfraction, olfraction, and olmax.
  gspmax=<gspmax> establish gspmax as the maximum number of GSPs (gapped HSPs) to report per subject sequence or pairwise sequence comparison. If more than gspmax GSPs are found, only the best-scoring GSPs are retained for subsequent processing and reporting. The setting of gspmax will have no effect if the nogaps option is specified or if the setting of hspmax is more restrictive.
The default value for gspmax is 0, which implies no limit.
See also: hspmax, spoutmax.
NOTE: the B and V options limit the number of subject sequences for which any results whatsoever are reported, regardless of the number of HSPs or GSPs found.
  H=<h> use h for the value of the relative entropy, H, when computing the statistics of ungapped alignments.
NOTE: In BLAST 1.4 and earlier, the H option was used to invoke the display of a histogram of search results; this functionality is no longer supported.)
See also: gapH.
  haltonfatal when processing a file containing multiple query sequences, use this option to halt further processing at the first occurrence of a FATAL error. Processing will otherwise resume with the next query sequence when a FATAL error arises.
See also: globalexit.
  hitdist=<hitdist> invoke a 2-hit BLAST algorithm similar to (but more sensitive and efficient than) that of Altschul et al. (1997), with the maximum distance between word hits along the same diagonal of <hitdist> residues, for seeding ungapped extensions. Altschul et al. (1997) use the equivalent of hitdist=40 in the BLASTP, BLASTX, TBLASTN and TBLASTX search modes. In WU BLASTN, setting hitdist=W and wink=W, where W is the word length, is akin to using double-length words generated on W-mer boundaries.
NOTE: In protein-level comparisons, for best sensitivity (or the best sensitivity for the amount of memory used), 2-hit BLAST is not recommended.
See also: wink.
  hspmax=<hspmax> establishes hspmax as the maximum number of ungapped HSPs that will be saved per subject sequence or pairwise sequence comparison. Saved HSPs are then fed to the gapped alignment phase of the program or are statistically evaluated if gapped alignments are not to be performed. If more than hspmax HSPs are found, only the best-scoring HSPs are retained for subsequent processing.
The default value is 1000; a value of 0 signifies no limit.
See also: gspmax and spoutmax.
NOTE: This usage of hspmax is subtly, but importantly, different from the parameter's classical interpretation, wherein all ungapped HSPs that satisfied the S2 score threshold were saved; hspmax merely limited the number of HSPs (gapped or ungapped) that would be reported. The new interpretation was instituted to provide vastly improved speed on large problems, while imparting no effect on small problems and many medium-sized problems. The new behavior can help guard against horrendously slow searches resulting from an inadvertent omission of a low-complexity filter. Adverse effects on sensitivity may be obtained, however, if every HSP is sacred. To restore classical behavior, specify hspmax=0. As a compromise between sensitivity and speed, set a higher value than the default.
NOTE: the B and V options limit the number of database or subject sequences for which any results are reported, regardless of the number of HSPs or GSPs found.
  hspsepQmax=<d> maximum allowed separation along the query sequence between two HSPs (gapped or ungapped) that will be clustered into a “consistent” set. Distance is measured here in units of residues at the level of the actual sequence comparisoni.e., in nucleotides for BLASTN and in peptides (or codons) for all other search modes. This option is useful for improving the statistical power to discriminate clusters that have potential biological interest from random background clusters, when the query sequence is significantly longer than the features of interest. Without this restriction, HSPs may be linked that arise from very distant portions of the query sequence. Depending on the specific search performed, distant links may be desirable, but often a reasonable setting for this parameter might be the expected maximum length of an intron. A distance restriction not only avoids clustering HSPs that would be widely separated but improves the statistics of those HSPs that still can be clustered.
  hspsepSmax=<d> maximum allowed separation along the subject (database) sequence between two HSPs (gapped or ungapped) that will be clustered into a consistent set. Distance is measured here in units of residues at the level of the actual sequence comparisoni.e., in nucleotides for BLASTN and in peptides (or codons) for all other search modes. This option is useful for improving the statistical power to discriminate clusters that have potential biological interest from random background clusters, when the database contains sequences significantly longer than the features of interest. Without this restriction, HSPs may be linked that arise from very distant portions of a subject sequence. Depending on the specific search performed, distant links may be desirable, but often a reasonable setting for this parameter might be the expected maximum length of an intron. A distance restriction not only avoids clustering HSPs that would be widely separated but improves the statistics of those HSPs that still can be clustered.
  K=<k> set the value for extreme value statistics K parameter (Karlin and Altschul, 1990) used in computing the statistics of ungapped alignments.
See also: gapK.
  kap use basic Karlin and Altschul (1990) statistics on individual alignment scores (i.e., do not evaluate the joint probability of multiple consistent HSP scores, such as with Poisson or Karlin and Altschul (1993) “Sum” statistics); in order to be reported, each HSP must pass the significance test on its own; these basic statistics are an option in all search modes.
  L=<lambda> use lambda for the value of the λ parameter in the extreme value statistics (Karlin and Altschul, 1990) used in computing the statistics of ungapped alignments.
See also: gapL.
  lcfilter replace any lower case letters in the input query sequence with the appropriate ambiguity code for “any” residue (N for nucleotide sequences; X for protein sequences).
See also: lcmask, filter, wordmask and echofilter.
  lcmask when generating the neighborhood word list for the query sequence, do not process any portions of the query that were represented in lower case letters in the input file. Lower case letters in the query sequence remain unchanged by this “soft masking” procedure and can therefore participate in alignments seeded by word hits that occur in flanking regions.
See also: lcfilter, wordmask, filter, maskextra and echofilter.
  links report consistent link information for each alignment, indicating the set of “consistent” alignments used in joint statistical significance calculations. Links information appears on its own line for each HSP and begins with the keyword Links. Each HSP involving the query and a given subject sequence is numbered from 1 to n, where n is the total number of HSPs reported for the pair of sequences. When the links option is specified, the current HSP number is enclosed in parenthese.

For example, the links information for an HSP might look like the following, where the HSP number 1 enclosed in parentheses indicates that this information accompanied the first HSP reported for the given subject sequence. It is evident in this example that a total of at least 8 HSPs were reported for the subject sequence (re: the 8 in the links list), but only 3 consistent HSPs (numbers 8, 2 and 1, in that order) were involved in obtaining the Sum statistics P-value of 0.15.

		 Score = 72 (30.4 bits), Expect = 0.16, Sum P(3) = 0.15
		 Identities = 41/174 (23%), Positives = 74/174 (42%)
		 Links = 8-2-(1)
		 
NOTE: While all link lists describe sets of consistent HSPs, unless one of the topcomboN or topcomboE options is used, only the list reported for HSPs in the most significant set for each subject sequence is guaranteed to represent the precise set of HSPs for which the joint statistics were computed; all other link lists often do correctly describe the set of HSPs involved but may have one or more missing or extraneous HSPs.
See also: hspsepQmax, hspsepSmax, topcomboE and topcomboN.
  M=<m> set the positive reward score for matching nucleotides in the BLASTN search mode to m, with default value +5.
For compatibility with earlier versions of BLAST, in search modes other than BLASTN, the M option is synonymous with the matrix option. To provide a fully specified scoring matrix to BLASTN, the matrix option itself must be used.
See also: N, matrix and altscore.
  maskextra=<extra> soft mask for an additional extra letters to each side of regions that are soft masked by the lcmask and wordmask options. This reduces the incidence of high scoring alignments in low-complexity regions that would be initiated by spurious word hits in otherwise unmasked flanking regions.
See also: wordmask, lcmask and lcfilter.
  matrix=<name> use the 2-dimensional matrix named name to score residue pairs in gapped and ungapped alignments. The default matrix for protein-level searches is BLOSUM62 (Henikoff and Henikoff, 1992). For BLASTN searches, the default scoring matrix is computed dynamically from a +5/-4 match/mismatch scoring system which can be altered using the M and N parameters. BLASTN can also use fully specified scoring matrices of the user's own design, by providing the name of the matrix with the matrix option. After unpacking the software, see the matrix/nt subdirectory for some examples of nucleotide scoring matrices.
NOTE: matrices need not be symmetric about their major diagonal. The row-column format of a matrix corresponds to query-subject letter pairs.
See also: altscore, M and N.
  mformat=<m>[,outfile] used to select an output format by numerical identifier, m, and optionally the name of the file where the output should be written, outfile. Multiple formats may be chosen for simultaneous output during a single search, as long as a different outfile is indicated for each format. If no outfile is specified, either standard output (stdout) or the setting of the O option (if set) is used. At most one mformat specification on a given command line may lack an outfile. If outfile contains any white space (e.g., blanks or tabs), the entire token should be enclosed in quotes, to prevent command line interpreters from breaking it into separate arguments.
The various output formats available are displayed if mformat=list is specified on an otherwise syntactically correct command line. (Example:  blastp foo foo mformat=list). Setting mformat=0 clears any mformat specification(s) appearing to the left on the command line.
Depending on the output format, some command line options cause additional elements to appear; these options include: topcomboN, topcomboE and links.

The available choices for m and their associated formats are:

list output this list and halt;
0 reset to default output only;
1 pairwise (default);
2 tabular (see description);
3 tabular with comments (see description);
4 PostScript™ graphics* (see description);
5 neighborhood word listings;*
7 XML conforming to NCBI_BlastOutput.dtd (see example, best viewed in Firefox).

*Formats that are subject to change or removal without notice.

See also: msgstyle, O and xmlcompact.
  mmio turns off the use of memory-mapped I/O when reading database files. Use of this option will usually slow the search, particularly when multiple processors are being used, but it serves both to demonstrate the effectiveness of this form of I/O and to validate the associated I/O routines. Note that no special daemon or support programs (such as the old memfile program) are required to take full advantage of memory-mapped I/O. When running 32-bit versions of the BLAST software, the mmio option might free up important virtual address space for use as working storage or heap memory.
For the vast majority of users, this option should never be used.
  msgstyle=<n> used to select by numerical identifier, n, the style of informatory messages to produce (i.e., NOTEs, WARNINGs, etc.)
The available choices for n and their associated styles are:
    0 => line-wrapped (default)
    1 => single-line with the query sequence identifier embedded (if available)
		
  N=<n> set the negative penalty score for mismatching nucleotides in the BLASTN search mode to n, with default value -4.
See also: M, matrix, and altscore.
  nogaps do not create gapped alignments, in essence reverting to WU BLAST 1.4 behavior
See also: gaps and gapall.
  nonnegok Do not abort processing with a FATAL error when the expected score is non-negative. Formally, for Karlin-Dembo-Altschul statistics to apply to the evaluation of the alignment scores found during a search, the expected score for a sequence having the same residue composition as the query must be negative, but this condition does not always hold with unusual scoring matrices or query sequences. Use the novalidctxok option to cause the search to proceed even under these unusual conditions.
See also: novalidctxok and shortqueryok.
  nosegs do not segment the query sequence on hyphens (-). By default, hyphens in the query sequence create insurmountable barriers for sequence alignment. As an example of where this feature is useful, multiple contigs may be concatenated together into one sequence with a hyphen separating each contig; no alignment will then extend beyond a contig boundary.
CAUTION: do not confuse this option with the similarly appearing noseqs option.
  noseqs produce abbreviated output by omitting the sequence alignments. The result is often correctly interpretable by parsers of normal output.
CAUTION: do not confuse this option with the similarly appearing nosegs option.
  notes suppress all NOTE messages. Important recommendations from the software may be missed if this option is used. If any NOTEs arise with this option, the number SUPPRESSED will be reported at the end of the search.
See also: warnings.
  novalidctxok do not treat it as a FATAL error when none of the “contexts” (e.g., strands or reading frames) of the query are valid. A valid context is one in which the threshold score for saving alignments can be achieved under ideal circumstances (typically if an alignment of 100% identity were to be found).
See also: nonnegok and shortqueryok.
  nwlen=<len> generate neighborhood words (or seed words) starting from the beginning of the query sequence (or from the location specified with the nwstart parameter) and continuing for the distance len or to the end of the sequence, whichever comes first. While this parameter can be used to restrict the region in which word hits occur for seeding ungapped alignments (and indirectly gapped alignments), it does not restrict alignments from extending beyond this region.
See also: nwstart.
  nwstart=<start> generate neighborhood words (or seed words) starting from coordinate position start in the query sequence and continuing to the end of the sequence (or for the distance specified with the nwlen parameter). While this parameter can be used to restrict the region in which word hits occur for seeding ungapped alignments (and indirectly gapped alignments), it does not restrict alignments from extending beyond this region.
See also: nwlen.
  O=<outfile> output results to the file named outfile instead of standard output (stdout).
  olfraction=<f> set the maximum fractional length of overlap, f, of two ungapped alignments for them to be considered independent and mutually “consistent” and their joint (Sum or Poisson) probability to be computed. The default f is 0.1 (maximum 10% of the length from either end of either HSP). For any given pair of HSPs, the more restrictive of olfraction and olmax is used.
See also: golfraction, golmax, and olmax.
  olmax=<len> set the maximum permitted length of overlap (in residues), len, of two ungapped alignments for their joint (Sum or Poisson) probability to be computed. The default is unlimited length, with the maximum extent of overlap being governed only by the olfraction parameter.
See also: golfraction, golmax, and olfraction.
  pingpong Perform additional work to help ensure the alignments produced are locally optimal. This option typically adds 3-10% to the execution time, without affecting the results. Only rarely is an alignment and its associated score improved, for the additional time consumed by using this option.
  poissonp use Poisson statistics (Karlin and Altschul, 1990) to compute joint P-values of consistent sets of alignments; Poisson statistics are an option in all search modes.
  postsw perform full Smith-Waterman alignment of sequences and re-rank the database matches accordingly prior to output (currently supported in BLASTP only)
  progress=<s> provide an indication that the search is alive by outputting an asterisk (“*”) every s seconds during a search, if some other indication of activity has not been provided in the mean time. Such “keepalive” indicators may be useful when the software is invoked over a network connection. The default behavior (obtained with progress=0) is only to report the actual progress made through the database, using periods (“.”) and reports of percentages.
  prune do not prune HSP lists, but instead report all HSPs, even those that were not involved in satisfying the statistical significance threshold necessary for reporting the database sequence. NOTE: When the default Sum statistics are used, the normal pruning activity is robust; when Poisson statistics are used, some HSPs may get through the pruning process and be reported that were not involved in satisfying the statistical significance threshold.
See also: span, span1 and span2.
  putenv="NAME=VALUE" in the local environment to the BLAST search program, set the environment variable named NAME to the value VALUE.
See also: endgetenv, endputenv and getenv.
  pvalues report P-values (the default) in the initial one-line descriptions section of output.
See also: evalues.
  Q=<q> set the penalty for a gap of length one to q (default Q=9 for proteins; Q=10 for BLASTN).
See also: R.
  qframe=<f> search with the query sequence translated in the single reading frame f. This parameter is useful for speeding up a search and improving both the biological and statistical significance of the findings, when the reading frame of a translation product in the query is known in advance, such as when the query sequence entails a complete ORF. Reading frames on the top (plus) strand of the query are numbered 1, 2, 3; reading frames on the bottom (minus) strand are numbered -1, -2, -3.
See also: top, bottom, dbtop and dbbottom.
  Qoffset=<i> adjust all query sequence coordinates in the output by the fixed quantity i (default 0).
  qrecmax=<n> in a multi-sequence query file, end database searches with the query sequence numbered n.
  qrecmin=<m> in a multi-sequence query file, start database searches using the query sequence numbered m. Record are numbered starting with 1.
  qres treat as a FATAL error when the query sequence contains any invalid residue codes. By default, WARNINGs are issued for invalid residue codes, which are then skipped.
  qtype treat as a FATAL error if the query sequence appears from its letter composition to be of the wrong type (peptide or nucleotide).
  R=<r> set the per-residue penalty for extending a gap to r (default R=2 for proteins; R=10 for BLASTN)
See also: Q.
  restest causes the Bonferroni corrections used in computing statistical significance to depend upon the length in residues of each database sequence relative to the total number of residues in the database. restest is the default database-size correction method in the BLASTN, TBLASTN, and TBLASTX search modes.
See also: seqtest.
  S=<s> set the score-equivalence threshold for reporting database hits to s. Hits for a database sequence will only be reported if the statistical significance ascribed to one of its similar regions (or groups of similar regions) is at least as high as that of a single alignment with score S. By default, S is not actually used; all findings are compared against the E threshold. E and S are interchangeable, however, through standard Karlin-Dembo-Altschul statistics, with any setting of S implying an expectation threshold of its own (E=KNe-λS). If both E and S are specified on the command line, the one corresponding to the more restrictive (lower) E is used.
See also: E, gapS2, and S2.
  S2=<s> set the score threshold for saving ungapped HSPs to s. In the initial, ungapped alignment phase of a search, individual HSPs will only be saved for further use if their score is ≥S2. The default score threshold is computed from E2 and will depend on the scoring system. If both E2 and S2 are specified on the command line, the one corresponding to the more restrictive (higher) score threshold will be used.
gapS2, E2 and gapE2.
  seqtest causes the Bonferroni corrections used in computing statistical significance to depend upon the number of sequences in the database. seqtest is the default database-size correction method in the BLASTP and BLASTX search modes.
NOTE: For backward compatibility with legacy BLAST software, in all search modes, including BLASTP and BLASTX, if the Z option is specified, Z is expected to be expressed in units of residues, unless seqtest is also specified.
See also: restest and Z.
  shortqueryok do not treat it as a FATAL error when the query sequence is shorter than the BLAST algorithm word length.
See also: novalidctxok and nonnegok.
  Soffset=<i> adjust all subject sequence coordinates in the output by the fixed quantity i (default 0).
  sort_by_count sort database sequences from highest to lowest by the number of HSPs identified. Multiple sort_by* options may be specified and take precedence in the order specified.
  sort_by_highscore sort database sequences from highest to lowest by the highest HSP score found. Multiple sort_by* options may be specified and take precedence in the order specified.
  sort_by_pvalue sort database sequences from lowest to highest by their best P-value. Multiple sort_by* options may be specified and take precedence in the order specified. sort_by_pvalue is the default primary sort key.
  sort_by_subjectlength sort database sequences from longest to shortest. Multiple sort_by* options may be specified and take precedence in the order specified.
  sort_by_totalscore sort database sequences from highest to lowest by the sum total score of all HSPs found. Multiple sort_by* options may be specified and take precedence in the order specified.
  span retain HSPs (ungapped or gapped) regardless of whether they span or are spanned by any other HSP. When this option is specified, memory requirements may increase dramatically to accommodate an increased number of HSPs that must be tracked, particularly when the sequences being compared contain short periodicity repeats and low complexity regions.
See also: span1 and span2.
  span1 discard an HSP (ungapped or gapped) when it spans or is spanned by another HSP along either the query or the subject sequence (or both). When a pair of such HSPs is found, the one with the lowest score is discarded; if their scores are equal, the longer, less information-dense HSP is discarded.
See also: span and span2.
  span2 discard an HSP (ungapped or gapped) when it spans or is spanned by another HSP along both the query and subject sequences. When a pair of such HSPs is found, the one with the lowest score is discarded; if their scores are equal, the longer, less information-dense HSP is discarded.
span2 is the default behavior.
See also: span and span1.
  spoutmax=<spoutmax> establishes spoutmax as the maximum number of segment pairs to report in program output per subject sequence or pairwise comparison, independent of the number of HSPs or GSPs actually found and evaluated. If more than spoutmax segment pairs are found, the segment pairs are sorted by the criteria in effect for the search and only the first spoutmax segment pairs will be reported. The setting of spoutmax will have no effect if either hspmax or gspmax is more restrictive.
The default value for spoutmax is 0, which signifies no limit.
See also: hspmax and gspmax.
  stats gather a variety of statistics about the search (e.g., the number of word hits in each reading frame, the highest score observed, etc.) and report them in the output. Use of this option marginally impacts search speed.
  sump use Sum statistics (Karlin and Altschul, 1993) to compute joint P-values of consistent sets of alignments; the use of Sum statistics is the default behavior in all search modes.
  T=<t> set the neighborhood word score threshold for the ungapped BLAST algorithm to t. For a given word of length W in the query sequence, its neighborhood words are defined as the set of words that have scores ≥ T when aligned with it. Neighborhood words become the seed words used to find ungapped alignments by the BLAST algorithm. Lower values for T tend to yield a larger neighborhood, more potential seed words, and improved sensitivity for lower scoring alignments, but at the expense of increased memory use and run time. Higher values for T will yield a smaller (possibly empty) neighborhood word list and faster execution, at the expense of reduced sensitivity. The default T varies with the scoring matrix, word length, and between search modes. For improved sensitivity and to obtain behavior that better satisfies user expectations, identical words are included with neighborhood words in the list of potential seeds, if their score is positive but happens to be less than T.
No neighborhood words (only exactly matching words) are used by default in the BLASTN search mode; however, neighborhood words can be used even by BLASTN if a value for T is specified on the BLASTN command line. CAUTION: for the long word lengths typically employed with BLASTN, the memory required for neighborhood words can easily be prohibitive and may only be practical for shorter sequences.
  top used to restrict the search of a nucleotide sequence to the top (+) strand. In the TBLASTX search mode, where both query and subject are nucleotide sequences, the top option only affects the query sequence.
See also: bottom, dbtop, dbbottom and qframe.
  topcomboE=<Eratio> Eratio is the maximum ratio of Ecurrent/Ebest for which the current “topcombo” group of consistent (colinear) local alignments will be reported for a given database sequence. The "best" group is reported in the output as "Group = 1" and tends to be the most statistically significant. The default behavior is to impose no limit on this ratio, in which case all topcombo groups satisfying E are reported (up to a maximum of topcomboN groups, if specified).
See also: links and topcomboN.
  topcomboN=<n> report at most n “topcombo” groups of consistent (colinear) local alignments (HSPs). Each local alignment is allowed to be a member of only one group. Use of this option causes the addition of a "Group = #" indicator in the output for each HSP. Groups of HSPs tend to be assembled in decreasing order of statistical significance. Members of the most significant group thus tend to be reported with "Group = 1".
See also: links and topcomboE.
  ucdb force nucleotide sequence databases to be searched in their uncompressed form, with any-and-all ambiguity codes in place. This option is only effective in the BLASTN search mode for word lengths ≥ 7. Users should generally avoid specifying this option themselves, letting the software decide when to employ this search strategy. This option can increase sensitivity when ambiguity codes are present in database sequences, at the expense of memory and possibly speed. Searching the uncompressed database is the only available behavior for word lengths < 7. This option offers improved sensitivity only when searching databases in XDF format that contain ambiguity codes. The option is accepted by the software but offers no improvement in sensitivity for databases in the earlier BLAST 1.4 database format.
See also: cdb.
  V=<v> set the maximum number of one-line descriptions of significant database sequences to report in the first section of program output to v. The default limit is 500.
See also: B.
  W=<w> set the seed word length for the ungapped BLAST algorithm to w. The default word length for protein-level searches is 3 amino acids; for BLASTN searches, the default length is 11 nucleotides. Shorter word lengths may increase sensitivity, at the expense of increased run time. In all search modes, the acceptable range of word lengths is 1 ≤ w ≤ 1024.
  warnings suppress all WARNING messages.
CAUTION: important advisories may be missed if this option is used; however, if any WARNING situations should arise, the number SUPPRESSED will be reported at the end of the search.
See also: notes.
  wink=<wink> generate word hits at every winkth residue position along the query, where the default wink=1 produces neighborhood words at every position. For best sensitivity, wink should not be adjusted. Wink settings greater than 1 are best used to find identical or nearly identical sequences more rapidly. When used in conjunction with the hitdist option to obtain the highest search speed, care should be taken that desirable alignments are not precluded by these parameters. The wink parameter is only available in the licensed 2.0 software.
NOTE: When using BLASTN to search compressed nucleotide sequence databases in their compressed form, an increase in speed (and concommitant decrease in sensitivity) will not be observed unless wink is set to a value greater than the compression ratio, which is usually 4.
CAUTION: With versions of BLASTN prior to [15-Oct-2004], similarity of any length and even 100% identity can be missed when searching compressed nucleotide sequence databases in their compressed form, if wink is set to an even integer value. This is simply due to the likelihood of a phase mismatch between the compressed form of the query and the database sequence. Assigning odd values to wink can avoid such phase mismatches. The best solution, though, is to update to a newer version of BLASTN. Versions of WU BLAST dated [15-Oct-2004] and later automatically avoid the phase mismatch problem, so users need not be concerned. If you are using the wink option with BLASTN and are not running a more recent version, please update!
  wordmask=<filter> “soft mask” the query sequence using the indicated filter. A copy of the query sequence is passed through the filter program and any letters converted by it to ambiguity codes are skipped during neighborhood word or seed word generation. Unlike the filter option, the query sequence itself remains unaltered and available for alignment. Usage of the wordmask parameter is otherwise identical to that of filter, with the same set of filtering methods available for use.
See also: filter, lcmask, lcfilter and maskextra.
  wstrict when searching a nucleotide database sequence that contains one or more ambiguous residues, require that every ungapped alignment found during the initial, ungapped phase of a search actually contain an identical word hit (in the usual case of BLASTN usage) or neighborhood word hit (in the case of TBLASTN and TBLASTX). The wstrict option has no effect whatsoever on BLASTX and has no effect on BLASTP when gapped alignments (the default) are to be produced. When ungapped alignments are the desired end product from BLASTP (i.e., the -nogaps option is specified), wstrict will prevent the software from exhaustively searching diagonals that are found to contain HSPs in an effort to find other HSPs that would not be seeded by neighborhood word hits.
  X=<x> set the drop-off score for the ungapped BLAST algorithm to x. Ungapped extension of initial neighborhood word hits or seed word hits between the query and subject sequences continues until the cumulative alignment score deteriorates from the maximum value seen thusfar during the extension by a quantity X or more. The default value for X is the score associated with 10 bits of significance (2-10 < 10-3 probability) for protein-level searches or 20 bits of significance (2-20 < 10-6 probability) for nucleotide-level (BLASTN) searches. Higher values for X will increase sensitivity at the expense of run time, but with both typically diminishing rapidly in their rate of change.
See also: gapX.
  xmlcompact omit newline and white space characters normally reported between entities in XML documents produced with mformat=7. Their purpose is merely to improve the human readability of a document when using XML-ignorant viewers, but these characters often comprise a substantial fraction of the bytes in a document and are completely extraneous for the purposes of automated parsing and viewing with XML-aware software.
See also: mformat.
  Y=<y> set the effective length of the querY sequence (in units of residues) used in statistical significance calculations to y.
  Z=<z> set the effective size of the database (databaZe) used in statistical significance calculations to z. Unless overridden by the seqtest option, the unit of measure is residues. If seqtest is specified, the unit of measure is sequences.
See also: restest and seqtest.

Last modified: 2005-09-13


Return to the WU BLAST Archives home page

Copyright © 2004-2005 by Warren R. Gish, Saint Louis, Missouri 63108 USA. All rights reserved.