On finding lowest common ancestors: simplification and parallelization
SIAM Journal on Computing
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Combinatorial pattern discovery for scientific data: some preliminary results
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Text algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Simple and flexible detection of contiguous repeats using a suffix tree
Theoretical Computer Science
Knowledge Discovery in Databases
Knowledge Discovery in Databases
An Algorithm for Approximate Tandem Repeats
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Finding Repeats with Fixed Gap
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
The Journal of Machine Learning Research
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Characterization and extraction of irredundant tandem motifs
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
Given a text string x of n symbols and an integer constant d, we consider the problem of finding, for any pair (y,z) of subwords of x, the tandem index associated with the pair, which is defined as the number of times that y and z occur in tandem (i.e., with no intermediate occurrence of either one of them) within a distance of d symbols of x. Although in principle there might be O(n^4) distinct subword pairs in x, it is seen that it suffices to consider a family of only O(n^2) such pairs, with the property that for any neglected pair (y^',z^') there exists a corresponding pair (y,z) contained in our family such that: (i) y^' is a prefix of y and z^' is a prefix of z; and (ii) the tandem index of (y^',z^') equals that of (y,z). The main contribution of the paper consists of an algorithm showing that the computation of all non-zero tandem indices for a string can be carried out optimally in time and space linear in the size of the output.