Theoretical Computer Science
Rank/select on dynamic compressed sequences and applications
Theoretical Computer Science
Compressed Suffix Arrays for Massive Data
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
A Linear-Time Burrows-Wheeler Transform Using Induced Sorting
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices
Journal of Experimental Algorithmics (JEA)
Space-efficient construction of Lempel-Ziv compressed text indexes
Information and Computation
Lightweight BWT construction for very large string collections
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Lightweight data indexing and compression in external memory
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
FEMTO: fast search of large sequence collections
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Efficient algorithm for circular burrows-wheeler transform
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Lightweight algorithms for constructing and inverting the BWT of string collections
Theoretical Computer Science
A Compressed Suffix Tree Based Implementation With Low Peak Memory Usage
Electronic Notes in Theoretical Computer Science (ENTCS)
Hi-index | 0.00 |
With the first human DNA being decoded into a sequence of about 2.8 billion characters, much biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in the main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays, which reduces the space requirement from O(n log n) bits to O(n) bits. However, constructing compressed suffix arrays is still not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e., more than 13 gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from the text. The main contribution is a construction algorithm that uses only O(n) bits of working memory, and the time complexity is O(n log n). Our construction algorithm is also time and space efficient for texts with large alphabets such as Chinese or Japanese. Precisely, when the alphabet size is |Σ|, the working space is O(n log |Σ|) bits, and the time complexity remains O(n log n), which is independent of |Σ|.