Breaking a Time-and-Space Barrier in Constructing Full-Text Indices

Authors:
Wing-Kai Hon;Kunihiko Sadakane;Wing-Kin Sung
Affiliations:
wkhon@cs.nthu.edu.tw;sada@csce.kyushu-u.ac.jp;ksung@comp.nus.edu.sg
Venue:
SIAM Journal on Computing
Year:
2009

Citing 0
Cited 12

Compressed Suffix Arrays for Massive Data

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
A Linear-Time Burrows-Wheeler Transform Using Induced Sorting

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Scalability of communicators and groups in MPI

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
CST++

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Space-efficient construction of Lempel-Ziv compressed text indexes

Information and Computation
Fully compressed suffix trees

ACM Transactions on Algorithms (TALG)
Word-based self-indexes for natural language text

ACM Transactions on Information Systems (TOIS)
Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Lightweight data indexing and compression in external memory

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Memory-Aware BWT by segmenting sequences to support subsequence search

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Efficient algorithm for circular burrows-wheeler transform

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
On compressing and indexing repetitive sequences

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Suffix trees and suffix arrays are the most prominent full-text indices, and their construction algorithms are well studied. In the literature, the fastest algorithm runs in $O(n)$ time, while it requires $O(n\log n)$-bit working space, where $n$ denotes the length of the text. On the other hand, the most space-efficient algorithm requires $O(n)$-bit working space while it runs in $O(n\log n)$ time. It was open whether these indices can be constructed in both $o(n\log n)$ time and $o(n\log n)$-bit working space. This paper breaks the above time-and-space barrier under the unit-cost word RAM. We give an algorithm for constructing the suffix array, which takes $O(n)$ time and $O(n)$-bit working space, for texts with constant-size alphabets. Note that both the time and the space bounds are optimal. For constructing the suffix tree, our algorithm requires $O(n\log^{\epsilon}n)$ time and $O(n)$-bit working space for any $0