tuple_plot: Fast pairwise nucleotide sequence comparison with noise suppression

  • Authors:
  • Karol Szafranski;Niels Jahn;Matthias Platzer

  • Affiliations:
  • Genome Analysis, Leibniz Institute for Age Research, Fritz Lipmann Institute Beutenbergstr. 11, 07745 Jena, Germany;Genome Analysis, Leibniz Institute for Age Research, Fritz Lipmann Institute Beutenbergstr. 11, 07745 Jena, Germany;Genome Analysis, Leibniz Institute for Age Research, Fritz Lipmann Institute Beutenbergstr. 11, 07745 Jena, Germany

  • Venue:
  • Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 3.84

Visualization

Abstract

Summary: The program tuple_plot identifies and visualizes local similarities between two genomic sequences, typically 100 kb or longer, by applying the well-known dotplot principle. A dictionary of sequence words built from the input sequences serves to construct a task-specific expectancy model that is used to attribute significance values to pairwise word hits. The dictionary-based approach allows fast computation, the computation time scaling to O(N log N), depending on the size of the input sequences. The proposed scoring scheme appreciably increases the signal-to-noise ratio and may help to improve other word-based sequence comparison approaches. Availability: tuple_plot is available at http://genome.fli-leibniz.de/software.html and may be used under GNU public license. Contact: szafrans@fli-leibniz.de