The subsequence composition of a string

Authors:
Alberto Apostolico;Fabio Cunial
Affiliations:
Dipartimento di Ingegneria dellInformazione, Università di Padova, Via Gradenigo 6/A, Padova, Italy and College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA ...;College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA 30318, USA
Venue:
Theoretical Computer Science
Year:
2009

Citing 9
Cited 0

Complete inverted files for efficient text retrieval and analysis

Journal of the ACM (JACM)
Efficient detection of quasiperiodicities in strings

Theoretical Computer Science
The complexity of DNA

Complexity
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Three great challenges for half-century-old computer science

Journal of the ACM (JACM)
Protein Is Incompressible

DCC '99 Proceedings of the Conference on Data Compression
An inexact-suffix-tree-based algorithm for detecting extensible patterns

Theoretical Computer Science - Pattern discovery in the post genome
On the Complexity of Finite Sequences

IEEE Transactions on Information Theory
The similarity metric

IEEE Transactions on Information Theory

Quantified Score

Hi-index	5.23

Visualization

Abstract

Words that appear as constrained subsequences in a text-string are considered as possible indicators of the host string structure, hence also as a possible means of sequence comparison and classification. The constraint consists of imposing a bound on the number @w of positions in the text that may intervene between any two consecutive characters of a subsequence. A subset of such @w-sequences is then characterized that consists, in intuitive terms, of sequences that could not be enriched with more characters without losing some occurrence in the text. A compact spatial representation is then proposed for these representative sequences, within which a number of parameters can be defined and measured. In the final part of the paper, such parameters are empirically analyzed on a small collection of text-strings endowed with various degrees of structure.