Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs

  • Authors:
  • Anas Abu-Doleh;Erik Saule;Kamer Kaya;Ümit V. Çatalyürek

  • Affiliations:
  • Dept. of Biomedical Informatics, The Ohio State University and Dept. of Electrical and Computer Engineering, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University and Dept. of Electrical and Computer Engineering, The Ohio State University

  • Venue:
  • Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fast and robust algorithms and aligners have been developed to help the researchers in the analysis of genomic data whose size has been dramatically increased in the last decade due to the technological advancements in DNA sequencing. It was not only the size, but the characteristics of the data have been changed. One of the current concern is that the length of the reads is increasing. Although existing algorithms can still be used to process this fresh data, considering its size and changing structure, new and more efficient approaches are required. In this work, we address the problem of accurate sequence alignment on GPUs and propose a new tool, Masher, which processes long (and short) reads efficiently and accurately. The algorithm employs a novel indexing technique that produces an index for the 3, 137Mbp hg19 with a memory footprint small enough to be stored in a restricted-memory device such as a GPU. The results show that Masher is faster than state-of-the-art tools and obtains a good accuracy/sensitivity on sequencing data with various characteristics.