Anchoring millions of distinct reads on the human genome within seconds

Authors:
Tien Huynh;Michail Vlachos;Isidore Rigoutsos
Affiliations:
IBM T.J. Watson Research Center;IBM Zürich Research Laboratory;IBM T.J. Watson Research Center
Venue:
Proceedings of the 13th International Conference on Extending Database Technology
Year:
2010

Citing 5
Cited 0

Spatial hash-joins

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Accelerated off-target search algorithm for siRNA

Bioinformatics
Genome-scale disk-based suffix tree indexing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
SOAP

Bioinformatics
Fast and accurate short read alignment with Burrows–Wheeler transform

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the advent of next-generation DNA sequencing machines, there is an increasing need for the development of computational tools that can anchor accurately and expediently the millions of generated short DNA sequences (or reads) onto the genomes of target organisms. In this work, we describe 'Q-Pick', a new and efficient method for solving this problem. Q-Pick allows the rapid identification and anchoring of such reads with possible wildcards in large genomic databases, while guaranteeing completeness of results and efficiency of operation. Q-Pick requires very spartan memory and computational resources, and is trivially amenable to SIMD implementation; it can also be easily extended to handle longer reads, e.g. 75-mers or longer. Our experiments indicate that Q-Pick can anchor millions of distinct short reads against both strands of a mammalian genome in seconds, using a single-core computer processor.