Alphabet-independent linear-time construction of compressed suffix arrays using o(nlogn)-bit working space

Authors:
Joong Chae Na;Kunsoo Park
Affiliations:
Department of Advanced Technology Fusion, Konkuk University, Seoul 143-701, South Korea;School of Computer Science and Engineering, Seoul National University, Seoul 151-742, South Korea
Venue:
Theoretical Computer Science
Year:
2007

Citing 13
Cited 5

New indices for text: PAT Trees and PAT arrays

Information retrieval
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Compact pat trees

Compact pat trees
On the sorting-complexity of suffix tree construction

Journal of the ACM (JACM)
A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Tables

Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
New text indexing functionalities of the compressed suffix arrays

Journal of Algorithms
Indexing compressed text

Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing
Linear work suffix array construction

Journal of the ACM (JACM)
A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

Algorithmica

Compressed Suffix Arrays for Massive Data

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
A Linear-Time Burrows-Wheeler Transform Using Induced Sorting

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Space-efficient construction of Lempel-Ziv compressed text indexes

Information and Computation
Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Lightweight data indexing and compression in external memory

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics

Quantified Score

Hi-index	5.26

Visualization

Abstract

The suffix array is a fundamental index data structure in string algorithms and bioinformatics, and the compressed suffix array (CSA) and the FM-index are its compressed versions. Many algorithms for constructing these index data structures have been developed. Recently, Hon et al. [W.K. Hon, K. Sadakane, W.K. Sung, Breaking a time-and-space barrier in constructing full-text indices, in: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, 2003, pp. 251-260] proposed a construction algorithm using O(nloglog|Σ|) time and O(nlog|Σ|)-bit working space, which is the fastest algorithm using O(nlog|Σ|)-bit working space. In this paper we give an efficient algorithm to construct the index data structures. Our algorithm constructs the suffix array, the CSA, the FM-index, and Burrows-Wheeler transform using alphabet-independent O(n) time and -bit working space, where α=log32. Our algorithm takes less time and more space than Hon et al.'s algorithm. Our algorithm uses least working space among alphabet-independent linear-time algorithms.