Variable length local decoding and alignment-free sequence comparison

  • Authors:
  • Gilles Didier;Eduardo Corel;Ivan Laprevotte;Alex Grossmann;Claudine Landès-Devauchelle

  • Affiliations:
  • Institut de Mathématiques de Luminy, CNRS FRE 3529, Aix-Marseille Université 13288 Marseille Cedex 9, France;Institut für Mikrobiologie und Genetik, Georg-August Universität, 37077 Göttingen, Germany;Laboratoire Statistique et Génome, CNRS UMR 8071, Université dEvry-Val-dEssonne, 91037 Evry, France;Laboratoire Statistique et Génome, CNRS UMR 8071, Université dEvry-Val-dEssonne, 91037 Evry, France;Laboratoire Statistique et Génome, CNRS UMR 8071, Université dEvry-Val-dEssonne, 91037 Evry, France

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2012

Quantified Score

Hi-index 5.23

Visualization

Abstract

We present the variable length local decoding, a method which augments the alphabet of a sequence or a set of sequences. Roughly speaking, the approach distinguishes several types of symbols/nucleotides according to their contexts in the sequences. These contexts have variable lengths and are defined from a prefix code. We first give an original algorithm computing the decoding with a complexity linear both in time and memory space. Next, the approach is applied to alignment-free sequence comparison. We give a heuristic way to select context lengths relevant to this question. The comparison of sequences itself is based on the composition in ''augmented'' symbols of their variable length local decodings. The results of this comparison are illustrated on a biological alignment.