Using inversion signatures to generate draft genome sequence scaffolds

  • Authors:
  • Zanoni Dias;Ulisses Dias;João C. Setubal

  • Affiliations:
  • University of Campinas, Campinas - SP, Brazil;University of Campinas, Campinas - SP, Brazil;Virginia Tech, Blacksburg - VA

  • Venue:
  • Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a linear-time algorithm that can generate a contig scaffold for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Large-scale inversions are common rearrangement events in prokaryotic genomes. Even in draft genomes it is possible to detect the presence of inversions given sufficient sequencing coverage and a sufficiently close reference genome. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, although in this general case there is no guarantee that the scaffold will be completely correct. We compare the performance of SIS, the program that implements the algorithm, to five other scaffold-generating programs. The results from two batches of tests using real genomes and artificial contig boundaries show that SIS has significantly better performance.