Algorithms for approximate string matching
Information and Control
A new approach to text searching
Communications of the ACM
Fast text searching: allowing errors
Communications of the ACM
Fast and practical approximate string matching
Information Processing Letters
A fast bit-vector algorithm for approximate string matching based on dynamic programming
Journal of the ACM (JACM)
A note on compiling fixed point binary multiplications
Communications of the ACM
A technique for counting ones in a binary computer
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
n-gram/2L: a space and time efficient two-level n-gram inverted index structure
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Fast nGram-based string search over data encoded using algebraic signatures
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Bioinformatics
Reference-based alignment in large sequence databases
Proceedings of the VLDB Endowment
The Art of Computer Programming: Combinatorial Algorithms, Part 1
The Art of Computer Programming: Combinatorial Algorithms, Part 1
WHAM: a high-throughput sequence alignment method
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Over the last decade, the cost of producing genomic sequences has dropped dramatically due to the current so-called next-generation sequencing methods. However, these next-generation sequencing methods are critically dependent on fast and sophisticated data processing methods for aligning a set of query sequences to a reference genome using rich string matching models. The focus of this work is on the design, development and evaluation of a data processing system for this crucial “short read alignment” problem. Our system, called WHAM, employs hash-based indexing methods and bitwise operations for sequence alignments. It allows rich match models and it is significantly faster than the existing state-of-the-art methods. In addition, its relative speedup over the existing method is poised to increase in the future in which read sequence lengths will increase.