WHAM: a high-throughput sequence alignment method

Authors:
Yinan Li;Allison Terrell;Jignesh M. Patel
Affiliations:
University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 11
Cited 10

A new approach to text searching

Communications of the ACM
Fast text searching: allowing errors

Communications of the ACM
Cache Conscious Indexing for Decision-Support in Main Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
n-gram/2L: a space and time efficient two-level n-gram inverted index structure

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient exact set-similarity joins

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Fast nGram-based string search over data encoded using algebraic signatures

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
VGRAM: improving performance of approximate queries on string collections using variable-length grams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cost-based variable-length-gram selection for string collections to support approximate queries efficiently

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient Subsequence Matching Using the Longest Common Subsequence with a Dual Match Index

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Reference-based alignment in large sequence databases

Proceedings of the VLDB Endowment
The Art of Computer Programming: Combinatorial Algorithms, Part 1

The Art of Computer Programming: Combinatorial Algorithms, Part 1

Accelerating short read mapping on an FPGA (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
A generic framework for efficient and effective subsequence retrieval

Proceedings of the VLDB Endowment
Integrating GPU-accelerated sequence alignment and SNP detection for genome resequencing analysis

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Massive genomic data processing and deep analysis

Proceedings of the VLDB Endowment
High-performance short sequence alignment with GPU acceleration

Distributed and Parallel Databases
WHAM: A High-Throughput Sequence Alignment Method

ACM Transactions on Database Systems (TODS)
Approximate regional sequence matching for genomic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Searching similar segments over textual event sequences

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Asymmetric signature schemes for efficient exact edit similarity query processing

ACM Transactions on Database Systems (TODS)
RCSI: scalable similarity search in thousand(s) of genomes

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the last decade the cost of producing genomic sequences has dropped dramatically due to the current so called "next-gen" sequencing methods. However, these next-gen sequencing methods are critically dependent on fast and sophisticated data processing methods for aligning a set of query sequences to a reference genome using rich string matching models. The focus of this work is on the design, development and evaluation of a data processing system for this crucial "short read alignment" problem. Our system, called WHAM, employs novel hash-based indexing methods and bitwise operations for sequence alignments. It allows richer match models than existing methods and it is significantly faster than the existing state-of-the-art method. In addition, its relative speedup over the existing method is poised to increase in the future in which read sequence lengths will increase. The WHAM code is available at http://www.cs.wisc.edu/wham/.