A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Linear work suffix array construction
Journal of the ACM (JACM)
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
Fast BWT in small space by blockwise suffix sorting
Theoretical Computer Science
Better external memory suffix array construction
Journal of Experimental Algorithmics (JEA)
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Bioinformatics
A four-stage algorithm for updating a Burrows-Wheeler transform
Theoretical Computer Science
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
SIAM Journal on Computing
Fast lightweight suffix array construction and checking
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Hi-index | 0.00 |
Nowadays, Burrows-Wheeler transform (BWT) has been receiving significant attentions in academia for addressing subsequence matching problems. Although BWT is a typical technique to transform a sequence into a new sequence that is "easy to compress", it can also be extended as a kind of full text index techniques. Traditional BWT requires nlogn+nlogσ bits to build index for a sequence with n characters, where σ is size of the alphabet. Building BWT index for a long sequence on PCs with limited memory is a great challenge. In order to solve the problem, we propose a novel variation of BWT index named S-BWT, which separates the source sequence into segments. It can reduce the memory cost to n(logσ+logn−logk )/k bits, where k is the number of segments. However, querying on each segment separately using the existing approaches has to undertake the risk of losing some significant results. In this paper, we propose two query methods based on S-BWT and guarantee to find all subsequence occurrences. Our methods can not only require small memory space, but also are faster than the state-of-art BWT backward search method for long sequence.