Compressed Suffix Arrays for Massive Data

  • Authors:
  • Jouni Sirén

  • Affiliations:
  • Department of Computer Science, University of Helsinki, Finland

  • Venue:
  • SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a fast space-efficient algorithm for constructing compressed suffix arrays (CSA). The algorithm requires O (n logn ) time in the worst case, and only O (n ) bits of extra space in addition to the CSA. As the basic step, we describe an algorithm for merging two CSAs. We show that the construction algorithm can be parallelized in a symmetric multiprocessor system, and discuss the possibility of a distributed implementation. We also describe a parallel implementation of the algorithm, capable of indexing several gigabytes per hour.