Compact recognizers of episode sequences

Authors:
Alberto Apostolico;Mikhail J. Atallah
Affiliations:
Purdue Univ., West Lafayette, Indiana;Purdue Univ., West Lafayette, Indiana
Venue:
Information and Computation
Year:
2002

Citing 6
Cited 8

Searching subsequences

Theoretical Computer Science
Text algorithms

Text algorithms
Pattern matching algorithms

Pattern matching algorithms
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Approximate String-Matching over Suffix Trees

CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Episode Matching

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching

Hidden Pattern Statistics

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Reliable Detection of Episodes in Event Sequences

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Episode directed acyclic subsequence graph

Nordic Journal of Computing
Reliable detection of episodes in event sequences

Knowledge and Information Systems
An inexact-suffix-tree-based algorithm for detecting extensible patterns

Theoretical Computer Science - Pattern discovery in the post genome
Hidden word statistics

Journal of the ACM (JACM)
Algebraic aspects of some Riordan arrays related to binary words avoiding a pattern

Theoretical Computer Science
Bridging lossy and lossless compression by motif pattern discovery

General Theory of Information Transfer and Combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given two strings X=a1...an and P=b1...bm over an alphabet , the problem of testing whether P occurs as a subsequence of X is trivially solved in linear time. It is also known that a simple O (n log ||) time preprocessing of X makes it easy to decide subsequently, for any P and in at most |P| log || character comparisons, whether P is a subsequence of X. These problems become more complicated if one asks instead whether P occurs as a subsequence of some substring Y of X of bounded length. This paper presents an automaton built on the textstring X and capable of identifying all distinct minimal substrings Y of X having P as a subsequence. By a substring Y being minimal with respect to P, it is meant that P is not a subsequence of any proper substring of Y. For every minimal substring Y, the automaton recognizes the occurrence of P having the lexicographically smallest sequence of symbol positions in Y. It is not difficult to realize such an automaton in time and space O (n2) for a text of n characters. One result of this paper consists of bringing those bounds down to linear or O (n log n), respectively, depending on whether the alphabet is bounded or of arbitrary size, thereby matching the corresponding complexities of automata constructions for offline exact string searching. Having built the automaton, the search for all lexicographically earliest occurrences of P in X is carried out in time O (i=1mroccii) or O (n+i=1mroccii log n), depending on whether the alphabet is fixed or arbitrary, where rocci is the number of distinct minimal substrings of X having b1...bi as a subsequence (note that each such substring may occur many times in X but is counted only once in the bound). All log factors appearing in the above bounds can be further reduced to log log by resorting to known integer-handling data structures.