Frequency-based similarity for parameterized sequences: Formal framework, algorithms, and applications

  • Authors:
  • Gianluigi Greco;Giorgio Terracina

  • Affiliations:
  • -;-

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Computing sequence similarity is a problem emerging in several areas of research. Current solution algorithms are often based on alignment methods under the assumption that matching symbols, or at least a substitution schema among them, are known in advance. This is natural for sequences defined over the same alphabet of symbols. However, for sequences defined over different alphabets and in absence of an appropriate background knowledge, sequence similarity can be conveniently reconsidered from a different perspective where determining the best substitution schema is also part of the computation problem. The basic idea is that any symbol of a sequence can be correlated with many symbols of another, provided each correlation frequently occurs over the various positions of the alignment. This novel setting is formalized and relevant application domains fitting its peculiarities are illustrated. Moreover, the computational complexity of the alignment problems arising therein is analyzed, and practical solution approaches are proposed and validated over synthetic and real datasets.