Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Journal of Algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Space-Economical Algorithms for Finding Maximal Unique Matches
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
Constructing Suffix Tree for Gigabyte Sequences with Megabyte Memory
IEEE Transactions on Knowledge and Data Engineering
Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
ACM Computing Surveys (CSUR)
Compressed Suffix Trees with Full Functionality
Theory of Computing Systems
Dynamic entropy-compressed sequences and full-text indexes
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Transforming MPI source code based on communication patterns
Future Generation Computer Systems
A Compressed Enhanced Suffix Array Supporting Fast String Matching
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
ACM Transactions on Algorithms (TALG)
Practical compressed suffix trees
SEA'10 Proceedings of the 9th international conference on Experimental Algorithms
Hi-index | 0.00 |
Suffix tree is one of the most important data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet Σ = {A, C, G, T} can be stored in n log |Σ| = 2n bits, its suffix tree occupies O(n log n) bits. In practice, the size difference easily reaches factor 50. We report on an implementation of the compressed suffix tree very recently proposed by Sadakane (Theory of Computing Systems, in press). The compressed suffix tree occupies space proportional to the text size, i.e. O(n log |Σ|) bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, e.g. on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of normal suffix tree. At the same time, a representative algorithm is slowed down by factor 30. Our implementation follows the original proposal in spirit, but some internal parts are tailored towards practical implementation. Our construction algorithm has time requirement O(n log n log |Σ|) and uses closely the same space as the final structure while constructing it: on the 10 MB DNA sequence, the maximum space usage during construction is only 1.4 times the final product size.