The subsequence composition of a string

  • Authors:
  • Alberto Apostolico;Fabio Cunial

  • Affiliations:
  • Dipartimento di Ingegneria dellInformazione, Università di Padova, Via Gradenigo 6/A, Padova, Italy and College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA ...;College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA 30318, USA

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2009

Quantified Score

Hi-index 5.23

Visualization

Abstract

Words that appear as constrained subsequences in a text-string are considered as possible indicators of the host string structure, hence also as a possible means of sequence comparison and classification. The constraint consists of imposing a bound on the number @w of positions in the text that may intervene between any two consecutive characters of a subsequence. A subset of such @w-sequences is then characterized that consists, in intuitive terms, of sequences that could not be enriched with more characters without losing some occurrence in the text. A compact spatial representation is then proposed for these representative sequences, within which a number of parameters can be defined and measured. In the final part of the paper, such parameters are empirically analyzed on a small collection of text-strings endowed with various degrees of structure.