Theoretical Computer Science
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Journal of Algorithms
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
Journal of the ACM (JACM)
Approximate string matching using compressed suffix arrays
Theoretical Computer Science
When indexing equals compression: Experiments with compressing suffix arrays and applications
ACM Transactions on Algorithms (TALG)
ACM Computing Surveys (CSUR)
Note: A simple storage scheme for strings achieving entropy bounds
Theoretical Computer Science
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Compressed indexes for dynamic text collections
ACM Transactions on Algorithms (TALG)
Compressed Suffix Trees with Full Functionality
Theory of Computing Systems
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
A compressed self-index using a Ziv---Lempel dictionary
Information Retrieval
Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing
DCC '08 Proceedings of the Data Compression Conference
Dynamic Fully-Compressed Suffix Trees
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
An Improved Succinct Representation for Dynamic k-ary Trees
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Rank/select on dynamic compressed sequences and applications
Theoretical Computer Science
Range Quantile Queries: Another Virtue of Wavelet Trees
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
A Compressed Enhanced Suffix Array Supporting Fast String Matching
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
On Entropy-Compressed Text Indexing in External Memory
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
A Linear-Time Burrows-Wheeler Transform Using Induced Sorting
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Faster entropy-bounded compressed suffix trees
Theoretical Computer Science
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
SIAM Journal on Computing
Information Processing Letters
Succinct dynamic dictionaries and trees
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Engineering a compressed suffix tree implementation
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Cell-probe lower bounds for succinct partial sums
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Fully-functional succinct trees
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Succinct representations of dynamic strings
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Practical compressed suffix trees
SEA'10 Proceedings of the 9th international conference on Experimental Algorithms
Dynamic rank-select structures with applications to run-length encoded texts
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
A framework for dynamizing succinct data structures
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Compressed string dictionary look-up with edit distance one
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Compressed suffix trees for repetitive texts
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Multi-pattern matching with bidirectional indexes
Journal of Discrete Algorithms
Cross-document pattern matching
Journal of Discrete Algorithms
Hi-index | 0.01 |
Suffix trees are by far the most important data structure in stringology, with a myriad of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require Θ(n log n) bits of space, for a string of size n. This is considerably more than the n log2 σ bits needed for the string itself, where σ is the alphabet size. The size of suffix trees has been a barrier to their wider adoption in practice. Recent compressed suffix tree representations require just the space of the compressed string plus Θ(n) extra bits. This is already spectacular, but the linear extra bits are still unsatisfactory when σ is small as in DNA sequences. In this article, we introduce the first compressed suffix tree representation that breaks this Θ(n)-bit space barrier. The Fully Compressed Suffix Tree (FCST) representation requires only sublinear space on top of the compressed text size, and supports a wide set of navigational operations in almost logarithmic time. This includes extracting arbitrary text substrings, so the FCST replaces the text using almost the same space as the compressed text. An essential ingredient of FCSTs is the lowest common ancestor (LCA) operation. We reveal important connections between LCAs and suffix tree navigation. We also describe how to make FCSTs dynamic, that is, support updates to the text. The dynamic FCST also supports several operations. In particular, it can build the static FCST within optimal space and polylogarithmic time per symbol. Our theoretical results are also validated experimentally, showing that FCSTs are very effective in practice as well.