Design of an efficient out-of-core read alignment algorithm

Authors:
Arun S. Konagurthu;Lloyd Allison;Thomas Conway;Bryan Beresford-Smith;Justin Zobel
Affiliations:
National ICT Australia, Victoria Research Lab., Dept. of Electronics and Electrical Engineering, The Univ. of Melbourne, Parkville, Victoria, Australia and Department of Computer Science and Softw ...;National ICT Australia, Victoria Research Laboratory, Department of Electronics and Electrical Engineering, The University of Melbourne, Parkville, Victoria, Australia;National ICT Australia, Victoria Research Lab., Department of Electronics and Electrical Eng., The Univ. of Melbourne, Parkville, Victoria, Australia and Department of Computer Science and Softwar ...;National ICT Australia, Victoria Research Lab., Department of Electronics and Electrical Eng., The Univ. of Melbourne, Parkville, Victoria, Australia and Department of Computer Science and Softwar ...;Department of Computer Science and Software Eng., The Univ. of Melbourne, Parkville, Victoria, Australia and National ICT Australia, Victoria Research Lab., Department of Electronics and Electrica ...
Venue:
WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Year:
2010

Citing 11
Cited 0

A fast string searching algorithm

Communications of the ACM
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Efficient randomized pattern-matching algorithms

IBM Journal of Research and Development - Mathematics and computing
SOAP

Bioinformatics
SeqMap

Bioinformatics
ZOOM! Zillions of oligos mapped

Bioinformatics
Engineering Radix Sort for Strings

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Slider—maximum use of probability information for alignment of short sequence reads and SNP detection

Bioinformatics
PASS

Bioinformatics
MOM

Bioinformatics
Fast and accurate short read alignment with Burrows–Wheeler transform

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

New genome sequencing technologies are poised to enter the sequencing landscape with significantly higher throughput of read data produced at unprecedented speeds and lower costs per run. However, current in-memory methods to align a set of reads to one or more reference genomes are ill-equipped to handle the expected growth of read-throughput from newer technologies. This paper reports the design of a new out-of-core read mapping algorithm, Syzygy, which can scale to large volumes of read and genome data. The algorithm is designed to run in a constant, user-stipulated amount of main memory - small enough to fit on standard desktops - irrespective of the sizes of read and genome data. Syzygy achieves a superior spatial locality-of-reference that allows all large data structures used in the algorithm to be maintained on disk. We compare our prototype implementation with several popular read alignment programs. Our results demonstrate clearly that Syzygy can scale to very large read volumes while using only a fraction of memory in comparison, without sacrificing performance.