Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs

Authors:
Anas Abu-Doleh;Erik Saule;Kamer Kaya;Ümit V. Çatalyürek
Affiliations:
Dept. of Biomedical Informatics, The Ohio State University and Dept. of Electrical and Computer Engineering, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University and Dept. of Electrical and Computer Engineering, The Ohio State University
Venue:
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Year:
2013

Citing 15
Cited 0

Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Slider—maximum use of probability information for alignment of short sequence reads and SNP detection

Bioinformatics
Fast and accurate short read alignment with Burrows–Wheeler transform

Bioinformatics
Parallel short sequence mapping for high throughput genome sequencing

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
The GNUMAP algorithm

Bioinformatics
Fast and accurate long-read alignment with Burrows–Wheeler transform

Bioinformatics
High quality SNP calling using Illumina data at shallow coverage

Bioinformatics
FANGS: high speed sequence mapping for next generation sequencers

Proceedings of the 2010 ACM Symposium on Applied Computing
GASSST

Bioinformatics
GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors

CSE '10 Proceedings of the 2010 13th IEEE International Conference on Computational Science and Engineering
SHRiMP2

Bioinformatics
Parallel Mapping Approaches for GNUMAP

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Long read alignment based on maximal exact match seeds

Bioinformatics
Fast and accurate read alignment for resequencing

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fast and robust algorithms and aligners have been developed to help the researchers in the analysis of genomic data whose size has been dramatically increased in the last decade due to the technological advancements in DNA sequencing. It was not only the size, but the characteristics of the data have been changed. One of the current concern is that the length of the reads is increasing. Although existing algorithms can still be used to process this fresh data, considering its size and changing structure, new and more efficient approaches are required. In this work, we address the problem of accurate sequence alignment on GPUs and propose a new tool, Masher, which processes long (and short) reads efficiently and accurately. The algorithm employs a novel indexing technique that produces an index for the 3, 137Mbp hg19 with a memory footprint small enough to be stored in a restricted-memory device such as a GPU. The results show that Masher is faster than state-of-the-art tools and obtains a good accuracy/sensitivity on sequencing data with various characteristics.