Alphabet-independent linear-time construction of compressed suffix arrays using o(nlogn)-bit working space

  • Authors:
  • Joong Chae Na;Kunsoo Park

  • Affiliations:
  • Department of Advanced Technology Fusion, Konkuk University, Seoul 143-701, South Korea;School of Computer Science and Engineering, Seoul National University, Seoul 151-742, South Korea

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2007

Quantified Score

Hi-index 5.26

Visualization

Abstract

The suffix array is a fundamental index data structure in string algorithms and bioinformatics, and the compressed suffix array (CSA) and the FM-index are its compressed versions. Many algorithms for constructing these index data structures have been developed. Recently, Hon et al. [W.K. Hon, K. Sadakane, W.K. Sung, Breaking a time-and-space barrier in constructing full-text indices, in: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, 2003, pp. 251-260] proposed a construction algorithm using O(nloglog|Σ|) time and O(nlog|Σ|)-bit working space, which is the fastest algorithm using O(nlog|Σ|)-bit working space. In this paper we give an efficient algorithm to construct the index data structures. Our algorithm constructs the suffix array, the CSA, the FM-index, and Burrows-Wheeler transform using alphabet-independent O(n) time and -bit working space, where α=log32. Our algorithm takes less time and more space than Hon et al.'s algorithm. Our algorithm uses least working space among alphabet-independent linear-time algorithms.