Understanding BLAST's Parameters
- Aadit Mathur
- Jul 11
- 4 min read
In the previous post, we walked through how BLAST (the Basic Local Alignment Search Tool) works at a high level. We introduced the ideas of w-mers, seed extension, high-scoring segment pairs (HSPs), and filtering based on statistical significance. But all of that depended on several tunable parameters, w, X, S, E, and more that we didn't go into in depth on.
The Core Parameters
Think of BLAST parameters like the dials on a microscope. You’re not changing the specimen (your query), you’re adjusting the resolution, contrast, and brightness of what you see. Small tweaks can dramatically affect what results show up and which ones get filtered out.
Different implementations of BLAST may have slightly different parameters, or even extra ones. However, the way each parameter affects the output can be roughly described as changing both the accuracy and the sensitivity. This article will discuss the effects of various parameters upon your output, but in the end, you just have to play around with your values until you get reasonable-looking outputs. Usually, however, the defaults work fine.
The sensitivity of BLAST indicates how well it can detect more distantly related sequences. A more sensitive BLAST search will return more hits, including sequences that are only weakly similar to the query—potentially uncovering distant evolutionary relationships. A less sensitive search will return fewer, but more closely related, sequences, prioritizing specificity over breadth.
w — Word size
The length of each seed (w-mer), or "word" that BLAST uses to scan the database.
Nucleotide BLAST default: 11
Protein BLAST default: 3
A smaller w means more seeds, which means more chances of finding a match—but at the cost of speed. A larger w reduces sensitivity but speeds things up. For closely related sequences, a higher w is fine. For distant homology, you'll want to lower it.
Want to find subtle similarities in protein evolution? Lower w. Searching for nearly identical sequences? Increase w.
X — Extension Dropoff
Once a matching word (w-mer) is found, BLAST extends in both directions to try to build a longer high-scoring segment pair (HSP). The X parameter controls when this extension stops.
Think of this as a patience threshold: how far will BLAST keep going before it gives up?
If the running score drops by more than X without improvement, extension halts.
A lower X makes BLAST give up faster—faster but potentially misses longer weakly matching regions.
A higher X encourages more aggressive searching but can slow things down and introduce noise.
Use a higher X when you're looking for faint evolutionary signals. Use a lower X when you only care about strong, clear matches.
S — Minimum Score Threshold
Once BLAST builds an HSP, it asks: Is this even worth keeping? That’s where the S parameter comes in.
HSPs scoring below S are discarded.
This acts as an early filter—if the alignment doesn’t score high enough, it never makes it to later steps.
Raising S = more stringent filter, fewer results.Lowering S = more permissive filter, more results (but potentially more junk).
If you're overwhelmed by spurious hits, try nudging S up slightly.
E — Expect Value (E-value)
The infamous e-value. This is the statistical threshold most users interact with, even if they don’t touch any other setting.
The e-value estimates how many hits with that score (or better) you’d expect by chance in a database of that size.
Lower e-values = higher confidence.
Let’s say your alignment has an e-value of 1e-5. That means there’s a 0.00001 chance of seeing something that good just by random chance. Pretty solid.
Common defaults:
E = 10 means "show me anything that might be interesting."
For more selective searches, you might lower to E = 0.01 or even 1e-10.
Lower e-values are especially important in big databases—larger search space means more chances for random hits.
Gap Penalties — Existence & Extension
When aligning sequences, sometimes gaps (insertions or deletions) are needed. But gaps shouldn't come for free—otherwise BLAST would just align everything with lots of empty space.
Two key penalties:
Gap existence cost (G): the penalty for starting a new gap.
Gap extension cost (E): the cost for continuing a gap.
Default values work fine in most cases, but if your sequences are prone to lots of small indels (insertions/deletions), you might consider tweaking these.
Higher penalties = discourage gaps (more rigid alignments).
Lower penalties = allow more flexibility (possibly more sensitive, but noisier).
Tuning BLAST
For most routine work, the default BLAST parameters work fine. The defaults have been tuned over years to balance sensitivity, speed, and accuracy.
But if you’re:
Exploring deep homology,
Working with highly divergent species,
Or filtering for only the most confident hits,
…then adjusting these parameters can be useful.
Note: When tweaking, change one parameter at a time. See how it affects your output. If you change five things at once, you’ll never know what helped and what hurt.
Recap of Parameters
Parameter | Controls | Higher Value Does What |
w (word size) | Sensitivity vs. speed | Faster, less sensitive |
X (dropoff) | Extension aggressiveness | Longer extensions, slower |
S (score threshold) | Filter for good HSPs | Fewer HSPs, higher quality |
E (e-value) | Statistical significance | Fewer, more reliable results |
Gap penalties | Gap tolerance | Less gaps = stricter alignment |
Next Steps
Now that we understand the dials and levers behind BLAST, we can move from theory to implementation. In the next post, we’ll step through each part of the algorithm and begin writing some code to build a minimal (but functioning!) BLAST-inspired search tool using Go.
Comments