Fast and Practical Algorithms for Computing All the Runs in a String

  • Authors:
  • Gang Chen;Simon J. Puglisi;W. F. Smyth

  • Affiliations:
  • Algorithms Research Group, Department of Computing & Software, McMaster University, Hamilton, Ontario, L8S 4K1, Canada;Department of Computing, Curtin University, GPO Box U1987, Perth WA 6845, Australia;Algorithms Research Group, Department of Computing & Software, McMaster University, Hamilton, Ontario, L8S 4K1, Canada and Department of Computing, Curtin University, GPO Box U1987, Perth WA 6845, ...

  • Venue:
  • CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A repetition in a string x is a substring ${ \bf{w}} = {\it \bf{u}}^e$ of x, maximum e 驴 2, where u is not itself a repetition in w. A run in x is a substring ${\it \bf{w}} = {\it \bf{u}}^e{\it \bf{u^{*}}}$ of "maximal periodicity", where ${\it \bf{u}}^e$ is a repetition and u * a maximum-length possibly empty proper prefix of u. A run may encode as many as $|{\it \bf{u}}|$ repetitions. The maximum number of repetitions in any string ${\it \bf{x}} = {\it \bf{x}}[1..n]$ is well known to be 驴(nlogn). In 2000 Kolpakov & Kucherov showed that the maximum number of runs in x is O(n); they also described a 驴(n)-time algorithm, based on Farach's 驴(n)-time suffix tree construction algorithm (STCA), 驴(n)-time Lempel-Ziv factorization, and Main's 驴(n)-time leftmost runs algorithm, to compute all the runs in x. Recently Abouelhoda et al. proposed a 驴(n)-time Lempel-Ziv factorization algorithm based on an "enhanced" suffix array -- a suffix array together with other supporting data structures. In this paper we introduce a collection of fast space-efficient algorithms for computing all the runs in a string that appear in many circumstances to be superior to those previously proposed.