Counting suffix arrays and strings

  • Authors:
  • Klaus-Bernd Schürmann;Jens Stoye

  • Affiliations:
  • AG Genominformatik, Technische Fakultät, Universität Bielefeld, Germany;AG Genominformatik, Technische Fakultät, Universität Bielefeld, Germany

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2008

Quantified Score

Hi-index 5.23

Visualization

Abstract

Suffix arrays are used in various applications and research areas like data compression or computational biology. In this work, our goal is to characterise the combinatorial properties of suffix arrays and their enumeration. For a fixed alphabet size and string length, we divide the set of all strings into equivalence classes of strings that share the same suffix array. For each such equivalence class, we count the number of strings contained in it. We also give exact formulas for computing the number of equivalence classes. Our methods yield a lower bound for the compressibility of suffix arrays and build the foundation for the efficient generation of appropriate test data sets for suffix array based algorithms. We also show that summing up the elements of all equivalence classes forms a particular instance for some summation identities of Eulerian numbers.