A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

Authors:
Tak Wah Lam;Kunihiko Sadakane;Wing-Kin Sung;Siu-Ming Yiu
Affiliations:
-;-;-;-
Venue:
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Year:
2002

Citing 8
Cited 14

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Efficient suffix trees on secondary storage

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Reducing the space requirement of suffix trees

Software—Practice & Experience
A Database Index to Large Biological Sequences

Proceedings of the 27th International Conference on Very Large Data Bases
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science

Indexing text using the Ziv-Lempel trie

Journal of Discrete Algorithms - SPIRE 2002
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Linear work suffix array construction

Journal of the ACM (JACM)
Compressed indexes for dynamic text collections

ACM Transactions on Algorithms (TALG)
Alphabet-independent linear-time construction of compressed suffix arrays using o(nlogn)-bit working space

Theoretical Computer Science
Fast BWT in small space by blockwise suffix sorting

Theoretical Computer Science
Better external memory suffix array construction

Journal of Experimental Algorithmics (JEA)
Compressed text indexes: From theory to practice

Journal of Experimental Algorithmics (JEA)
Space-efficient construction of LZ-index

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
A new compressed suffix tree supporting fast search and its construction algorithm using optimal working space

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Linear-Time construction of compressed suffix arrays using o(n log n)-bit working space for large alphabets

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Simultaneously learning DNA motif along with its position and sequence rank preferences through EM algorithm

RECOMB'12 Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology
Memory-Aware BWT by segmenting sequences to support subsequence search

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Optimal lightweight construction of suffix arrays for constant alphabets

WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the first Human DNA being decoded into a sequence of about 2.8 billion base pairs, many biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays, which reduces the space requirement from O(n log n) bits to O(n) bits. However, constructing compressed suffix arrays is still not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e., more than 13 Gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from text. The main contribution is a new construction algorithm that uses only O(n) bits of working memory, and more importantly, the time complexity remains the same as before, i.e., O(n log n).