Parallel and memory-efficient reads indexing for genome assembly

  • Authors:
  • Guillaume Chapuis;Rayan Chikhi;Dominique Lavenier

  • Affiliations:
  • Computer Science Department, ENS Cachan/IRISA, Rennes, France;Computer Science Department, ENS Cachan/IRISA, Rennes, France;Computer Science Department, ENS Cachan/IRISA, Rennes, France

  • Venue:
  • PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

As genomes, transcriptomes and meta-genomes are being sequenced at a faster pace than ever, there is a pressing need for efficient genome assembly methods. Two practical issues in assembly are heavy memory usage and long execution time during the read indexing phase. In this article, a parallel and memory-efficient method is proposed for reads indexing prior to assembly. Specifically, a hash-based structure that stores a reduced amount of read information is designed. Erroneous entries are filtered on the fly during index construction. A prototype implementation has been designed and applied to actual Illumina short reads. Benchmark evaluation shows that this indexing method requires significantly less memory than those from popular assemblers.