Fast and Space Efficient Linear Suffix Array Construction

  • Authors:
  • Sen Zhang;Ge Nong

  • Affiliations:
  • -;-

  • Venue:
  • DCC '08 Proceedings of the Data Compression Conference
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel fast and space efficient linear suffix array construction algorithm (SACA) to break the performance and design bottlenecks for the existing linear SACAs. By sampling the fixed-size d-critical substrings to divide-and-conquer the problem, our new algorithm is very simple, for which a fully-functioning sample implementation is embodied in only about 100 lines of C code. The experimental results on the Canterbury and Manzini-Ferragina corpora show that our algorithm outperforms boththe K\"arkk\"ainen-Sanders (KS) and the Ko-Aluru (KA) algorithms: compared with the KS, ours can be more than twice faster and use more than 50% fewer space; compared with the KA, ours can be 9% faster and use 40% fewer space. To approach the lightweight space extreme, we further improve our linear algorithm to use an extra working space of only 0.25n+O(1) bytes to construct the suffix array for any size-n string of a constant or integer alphabet,where the characters of an integer alphabet are in [0..n-1]. Besides using less space, our lightweight linear algorithm stil lruns more than 1.5 times faster than the KS algorithm in the experiments.