Libgapmis: An ultrafast library for short-read single-gap alignment

  • Authors:
  • Simon Berger;Alexandras Stamatakis;Solon P. Pissis;Tomas Flouri;Nikolaos Alachiotis

  • Affiliations:
  • Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany;Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany;Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany;Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany;Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany

  • Venue:
  • BIBMW '12 Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A broad variety of short-read alignment programmes has been released recently to address the task of mapping tens of millions of short reads to a reference genome, placing emphasis on various aspects of the problem. Although all programmes allow for a small number of alignment mismatches, some of them either perform poorly when allowing gap insertions or they do not allow for gap insertions at all. The seed-and-extend strategy is applied in most of these programmes: after a fast alignment between a fragment of the reference sequence and a high-quality fragment of a short read — the seed — an important problem is to extend the alignment between a relatively short succeeding fragment of the reference sequence and the remaining low-quality fragment of the read allowing a number of mismatches and the insertion of gaps in the alignment. However, the length of the short reads in combination with the gap occurrence frequency observed in various applications suggest that the single-gap alignment of (parts of) those reads is desirable. In this article, we present libgapmis, an ultrafast library for pairwise short-read single-gap alignment including accelerated SSE-based and GPU-based versions. It implements an algorithm, which computes a modified version of the traditional dynamic programming matrix for sequence alignment to solve the above alignment problem. We show that the library functions of the CPU-based version are up to 20x faster compared to competing programmes, while the respective SSE-based and GPU-based versions are up to 6x and llx faster than our CPU-based implementation, respectively. The functions made available via our library can be seamlessly integrated into any short-read alignment pipeline.