Repetition Complexity of Words

  • Authors:
  • Lucian Ilie;Sheng Yu;Kaizhong Zhang

  • Affiliations:
  • -;-;-

  • Venue:
  • COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

With ideas from data compression and combinatorics on words, we introduce a complexity measure for words, called repetition complexity, which quantifies the amount of repetition in a word. The repetition complexity of w, r(w), is defined as the smallest amount of space needed to store w when reduced by repeatedly applying the following procedure: n consecutive occurrences uu . . . u of the same subword u of w are stored as (u, n). The repetition complexity has interesting relations with well-known complexity measures, such as subword complexity, sub, and Lempel-Ziv complexity, lz. We have always r(w) = lz(w) and could even be that the former is linear while the latter is only logarithmic; e.g., this happens for prefixes of certain infinite words obtained by iterated morphisms. An infinite word a being ultimately periodic is equivalent to: (i) sub(prefn(驴)) = O(n), (ii) lz(prefn(驴)) = O(1), and (iii) r(prefn(驴)) = lgn + O(1). De Bruijn words, well known for their high subword complexity are shown to have almost highest repetition complexity; the precise complexity remains open. r(w) can be computed in time O(n3(log n)2) and it is open, and probably very difficult, to find very fast algorithms.