Fast and practical algorithms for computing all the runs in a string

Authors:
Gang Chen;Simon J. Puglisi;W. F. Smyth
Affiliations:
Department of Computing & Software, McMaster University, Hamilton, Ontario, Canada;Department of Computing, Curtin University, Perth, Australia;Department of Computing & Software, McMaster University, Hamilton, Ontario, Canada and Department of Computing, Curtin University, Perth, Australia
Venue:
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Year:
2007

Citing 20
Cited 0

An O(n log n) algorithm for finding all repetitions in a string

Journal of Algorithms
Detecting leftmost maximal periodicities

Discrete Applied Mathematics - Combinatorics and complexity
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Reducing the space requirement of suffix trees

Software—Practice & Experience
Space-Efficient Data Structures for Flexible Text Retrieval Systems

ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Computing quasi suffix arrays

Journal of Automata, Languages and Combinatorics - Special issue: Selected papers of the 13th Australasian workshop on combinatorial algorithms
Engineering a Lightweight Suffix Array Construction Algorithm

Algorithmica
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing
Compressed full-text indexes

ACM Computing Surveys (CSUR)
A New Periodicity Lemma

SIAM Journal on Discrete Mathematics
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Space efficient linear time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Simple linear work suffix array construction

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
The number of runs in a string: improved analysis of the linear upper bound

STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
On the Complexity of Finite Sequences

IEEE Transactions on Information Theory
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

A repetition in a string x is a substring w = ue of x, maximum e ≥ 2, where u is not itself a repetition in w. A run in x is a substring w = ueu* of "maximal periodicity", where ue is a repetition and u* a maximum-length possibly empty proper prefix of u. A run may encode as many as |u| repetitions. The maximum number of repetitions in any string x = x[1..n] iswell known to be Θ(n log n). In 2000 Kolpakov & Kucherov showed that the maximum number of runs in x is O(n); they also described a Θ(n)-time algorithm, based on Farach's Θ(n)-time suffix tree construction algorithm (STCA), Θ(n)-time Lempel-Ziv factorization, and Main's Θ(n)-time leftmost runs algorithm, to compute all the runs in x. Recently Abouelhoda et al. proposed a Θ(n)-time Lempel-Ziv factorization algorithm based on an "enhanced" suffix array -- a suffix array together with other supporting data structures. In this paper we introduce a collection of fast space-efficient algorithms for computing all the runs in a string that appear in many circumstances to be superior to those previously proposed.