Repetitions in strings: Algorithms and combinatorics

  • Authors:
  • Maxime Crochemore;Lucian Ilie;Wojciech Rytter

  • Affiliations:
  • Department of Computer Science, Kings College London, London WC2R 2LS, UK and Université Paris-Est, France;Department of Computer Science, University of Western Ontario, N6A 5B7, London, Ontario, Canada;Institute of Informatics, Warsaw University, ul. Banacha 2, 02-097 Warszawa, Poland and Department of Mathematics and Informatics, Copernicus University, Torun, Poland

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2009

Quantified Score

Hi-index 5.24

Visualization

Abstract

The article is an overview of basic issues related to repetitions in strings, concentrating on algorithmic and combinatorial aspects. This area is important both from theoretical and practical points of view. Repetitions are highly periodic factors (substrings) in strings and are related to periodicities, regularities, and compression. The repetitive structure of strings leads to higher compression rates, and conversely, some compression techniques are at the core of fast algorithms for detecting repetitions. There are several types of repetitions in strings: squares, cubes, and maximal repetitions also called runs. For these repetitions, we distinguish between the factors (sometimes qualified as distinct) and their occurrences (also called positioned factors). The combinatorics of repetitions is a very intricate area, full of open problems. For example we know that the number of (distinct) primitively-rooted squares in a string of length n is no more than 2n-@Q(logn), conjecture to be n, and that their number of occurrences can be @Q(nlogn). Similarly we know that there are at most 1.029n and at least 0.944n maximal repetitions and the conjecture is again that the exact bound is n. We know almost everything about the repetitions in Sturmian words, but despite the simplicity of these words, the results are nontrivial. One of the main motivations for writing this text is the development during the last couple of years of new techniques and results about repetitions. We report both the progress which has been achieved and which we expect to happen.