Fast and Practical Algorithms for Computing All the Runs in a String

Authors:
Gang Chen;Simon J. Puglisi;W. F. Smyth
Affiliations:
Algorithms Research Group, Department of Computing & Software, McMaster University, Hamilton, Ontario, L8S 4K1, Canada;Department of Computing, Curtin University, GPO Box U1987, Perth WA 6845, Australia;Algorithms Research Group, Department of Computing & Software, McMaster University, Hamilton, Ontario, L8S 4K1, Canada and Department of Computing, Curtin University, GPO Box U1987, Perth WA 6845, ...
Venue:
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Year:
2007

Citing 0
Cited 13

Computing Longest Previous Factor in linear time and applications

Information Processing Letters
An Online Algorithm for Finding the Longest Previous Factors

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Efficient on-line repetition detection

Theoretical Computer Science
Repetitions in strings: Algorithms and combinatorics

Theoretical Computer Science
Extracting powers and periods in a string from its runs structure

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
New complexity results for the k-covers problem

Information Sciences: an International Journal
Hunting redundancies in strings

DLT'11 Proceedings of the 15th international conference on Developments in language theory
The three squares lemma revisited

Journal of Discrete Algorithms
A comparison of index-based lempel-Ziv LZ77 factorization algorithms

ACM Computing Surveys (CSUR)
Computing regularities in strings: A survey

European Journal of Combinatorics
More results on overlapping squares

Journal of Discrete Algorithms
Faster semi-external suffix sorting

Information Processing Letters
Extracting powers and periods in a word from its runs structure

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A repetition in a string x is a substring ${ \bf{w}} = {\it \bf{u}}^e$ of x, maximum e 驴 2, where u is not itself a repetition in w. A run in x is a substring ${\it \bf{w}} = {\it \bf{u}}^e{\it \bf{u^{*}}}$ of "maximal periodicity", where ${\it \bf{u}}^e$ is a repetition and u * a maximum-length possibly empty proper prefix of u. A run may encode as many as $|{\it \bf{u}}|$ repetitions. The maximum number of repetitions in any string ${\it \bf{x}} = {\it \bf{x}}[1..n]$ is well known to be 驴(nlogn). In 2000 Kolpakov & Kucherov showed that the maximum number of runs in x is O(n); they also described a 驴(n)-time algorithm, based on Farach's 驴(n)-time suffix tree construction algorithm (STCA), 驴(n)-time Lempel-Ziv factorization, and Main's 驴(n)-time leftmost runs algorithm, to compute all the runs in x. Recently Abouelhoda et al. proposed a 驴(n)-time Lempel-Ziv factorization algorithm based on an "enhanced" suffix array -- a suffix array together with other supporting data structures. In this paper we introduce a collection of fast space-efficient algorithms for computing all the runs in a string that appear in many circumstances to be superior to those previously proposed.