Complete inverted files for efficient text retrieval and analysis
Journal of the ACM (JACM)
Efficient detection of quasiperiodicities in strings
Theoretical Computer Science
Complexity
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Three great challenges for half-century-old computer science
Journal of the ACM (JACM)
DCC '99 Proceedings of the Conference on Data Compression
An inexact-suffix-tree-based algorithm for detecting extensible patterns
Theoretical Computer Science - Pattern discovery in the post genome
On the Complexity of Finite Sequences
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Hi-index | 5.23 |
Words that appear as constrained subsequences in a text-string are considered as possible indicators of the host string structure, hence also as a possible means of sequence comparison and classification. The constraint consists of imposing a bound on the number @w of positions in the text that may intervene between any two consecutive characters of a subsequence. A subset of such @w-sequences is then characterized that consists, in intuitive terms, of sequences that could not be enriched with more characters without losing some occurrence in the text. A compact spatial representation is then proposed for these representative sequences, within which a number of parameters can be defined and measured. In the final part of the paper, such parameters are empirically analyzed on a small collection of text-strings endowed with various degrees of structure.