A new approach to text searching
Communications of the ACM
Fast text searching: allowing errors
Communications of the ACM
Cache Conscious Indexing for Decision-Support in Main Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
n-gram/2L: a space and time efficient two-level n-gram inverted index structure
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Fast nGram-based string search over data encoded using algebraic signatures
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient Subsequence Matching Using the Longest Common Subsequence with a Dual Match Index
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Reference-based alignment in large sequence databases
Proceedings of the VLDB Endowment
The Art of Computer Programming: Combinatorial Algorithms, Part 1
The Art of Computer Programming: Combinatorial Algorithms, Part 1
Accelerating short read mapping on an FPGA (abstract only)
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
A generic framework for efficient and effective subsequence retrieval
Proceedings of the VLDB Endowment
Integrating GPU-accelerated sequence alignment and SNP detection for genome resequencing analysis
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Massive genomic data processing and deep analysis
Proceedings of the VLDB Endowment
High-performance short sequence alignment with GPU acceleration
Distributed and Parallel Databases
WHAM: A High-Throughput Sequence Alignment Method
ACM Transactions on Database Systems (TODS)
Approximate regional sequence matching for genomic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Searching similar segments over textual event sequences
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Asymmetric signature schemes for efficient exact edit similarity query processing
ACM Transactions on Database Systems (TODS)
RCSI: scalable similarity search in thousand(s) of genomes
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Over the last decade the cost of producing genomic sequences has dropped dramatically due to the current so called "next-gen" sequencing methods. However, these next-gen sequencing methods are critically dependent on fast and sophisticated data processing methods for aligning a set of query sequences to a reference genome using rich string matching models. The focus of this work is on the design, development and evaluation of a data processing system for this crucial "short read alignment" problem. Our system, called WHAM, employs novel hash-based indexing methods and bitwise operations for sequence alignments. It allows richer match models than existing methods and it is significantly faster than the existing state-of-the-art method. In addition, its relative speedup over the existing method is poised to increase in the future in which read sequence lengths will increase. The WHAM code is available at http://www.cs.wisc.edu/wham/.