Updates to the RMAP short-read mapping software

  • Authors:
  • Andrew D. Smith;Wen-Yu Chung;Emily Hodges;Jude Kendall;Greg Hannon;James Hicks;Zhenyu Xuan;Michael Q. Zhang

  • Affiliations:
  • -;-;-;-;-;-;-;-

  • Venue:
  • Bioinformatics
  • Year:
  • 2009

Quantified Score

Hi-index 3.84

Visualization

Abstract

Summary: We report on a major new version of the RMAP software for mapping reads from short-read sequencing technology. General improvements to accuracy and space requirements are included, along with novel functionality. Included in the RMAP software package are tools for mapping paired-end reads, mapping using more sophisticated use of quality scores, collecting ambiguous mapping locations and mapping bisulfite-treated reads. Availability: The applications described in this note are available for download at http://www.cmb.usc.edu/people/andrewds/rmap and are distributed as Open Source software under the GPLv3.0. The software has been tested on Linux and OS X platforms. Contact:andrewds@usc.edu; mzhang@cshl.edu The RMAP algorithm was introduced by (Smith et al., 2008) as one of the earliest available programs for mapping reads from the Illumina second-generation sequencing technology. One important contribution of RMAP was to incorporate the use of quality scores directly into the mapping process: read positions with too low a quality score were not considered while mapping, and that quality score cutoff could be adjusted by the user. Subsequently, numerous mapping algorithm have appeared (Langmead et al., 2009; Li,H. et al., 2008; Li,R. et al., 2008; Lin et al., 2008; Schatz, 2009; Yanovsky et al., 2008), with improvements in both efficiency and breadth of functionality (e.g. ability to map paired-end reads; integrated SNP calling). Investigators requiring solutions to mapping problems now have many options. As new applications of short-read sequencing emerge, many variations on the analysis task of read mapping emerge. Diversity in performance characteristics of existing mapping tools becomes potentially valuable. We report the first major update to RMAP. The basic algorithmic framework in RMAP is still to preprocess reads and scan the genome, but several modifications have been made and much additional functionality has been included. Importantly, RMAP has a memory footprint that depends on the number of reads being mapped. This feature allows RMAP to be used effectively in cluster environments with commodity nodes, because partitioning the reads allows natural parallelizations with linear reduction in memory requirements per processor core used. Included in this release of the RMAP software package is functionality for mapping paired-end reads, making more sophisticated use of quality scores, collecting mapping locations for ambiguously mapping reads and mapping bisulfite-treated reads.