Linear work suffix array construction

  • Authors:
  • Juha Kärkkäinen;Peter Sanders;Stefan Burkhardt

  • Affiliations:
  • University of Helsinki, Helsinki, Finland;University of Karlsruhe, Karlsruhe, Germany;Google, Inc., Zurich, Switzerland

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to linear-time construction algorithms and more explicit structure. We narrow this gap between theory and practice with a simple linear-time construction algorithm for suffix arrays. The simplicity is demonstrated with a C++ implementation of 50 effective lines of code. The algorithm is called DC3, which stems from the central underlying concept of difference cover. This view leads to a generalized algorithm, DC, that allows a space-efficient implementation and, moreover, supports the choice of a space--time tradeoff. For any v ∈ [1,&nradic;], it runs in O(vn) time using O(n/&vradic;) space in addition to the input string and the suffix array. We also present variants of the algorithm for several parallel and hierarchical memory models of computation. The algorithms for BSP and EREW-PRAM models are asymptotically faster than all previous suffix tree or array construction algorithms.