Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Reducing the space requirement of suffix trees
Software—Practice & Experience
An experimental study of an opportunistic index
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Database indexing for large DNA and protein sequence collections
The VLDB Journal — The International Journal on Very Large Data Bases
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
OASIS: an online and accurate technique for local-alignment searches on biological sequences
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Hi-index | 0.00 |
Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing long DNA sequences such as the human genome (about 3 billion characters) in the main memory [5,13,16]. However, these indexes are designed for exact pattern matching, which is too stringent for most biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). In this paper, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments with any pattern P. Experiments reveal that BWT-SW is very efficient (e.g., aligning a pattern of length 3,000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically, using a simpler model (with gaps disallowed) and random strings. We find that the expected running time is O(|T|0.628|P|). As far as we know, BWT-SW is the first practical tool that can find all local alignments.