WHAM: a high-throughput sequence alignment method

  • Authors:
  • Yinan Li;Allison Terrell;Jignesh M. Patel

  • Affiliations:
  • University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA

  • Venue:
  • Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Over the last decade the cost of producing genomic sequences has dropped dramatically due to the current so called "next-gen" sequencing methods. However, these next-gen sequencing methods are critically dependent on fast and sophisticated data processing methods for aligning a set of query sequences to a reference genome using rich string matching models. The focus of this work is on the design, development and evaluation of a data processing system for this crucial "short read alignment" problem. Our system, called WHAM, employs novel hash-based indexing methods and bitwise operations for sequence alignments. It allows richer match models than existing methods and it is significantly faster than the existing state-of-the-art method. In addition, its relative speedup over the existing method is poised to increase in the future in which read sequence lengths will increase. The WHAM code is available at http://www.cs.wisc.edu/wham/.