Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Compact pat trees
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Journal of Algorithms
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Space-Economical Algorithms for Finding Maximal Unique Matches
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Constructing Suffix Tree for Gigabyte Sequences with Megabyte Memory
IEEE Transactions on Knowledge and Data Engineering
Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
ACM Computing Surveys (CSUR)
Rank and select revisited and extended
Theoretical Computer Science
Compressed Suffix Trees with Full Functionality
Theory of Computing Systems
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Space-efficient static trees and graphs
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Obtaining provably good performance from suffix trees in secondary storage
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Inverted indexes for phrases and strings
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Scalable detection of frequent substrings by grammar-based compression
DS'11 Proceedings of the 14th international conference on Discovery science
Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Exploiting SIMD instructions in current processors to improve classical string algorithms
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
RCSI: scalable similarity search in thousand(s) of genomes
Proceedings of the VLDB Endowment
A Compressed Suffix Tree Based Implementation With Low Peak Memory Usage
Electronic Notes in Theoretical Computer Science (ENTCS)
Hi-index | 0.00 |
Suffix tree is one of the most important data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet Σ = {A,C,G,T} can be stored in n log |Σ| &equlas; 2n bits, its suffix tree occupiesO(n log n) bits. In practice, the size difference easily reaches factor 50. We report on an implementation of the compressed suffix tree very recently proposed by Sadakane (2007). The compressed suffix tree occupies space proportional to the text size, that is, O(n log |Σ|) bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, for example, on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of the normal suffix tree. At the same time, a representative algorithm is slowed down by factor 30. Our implementation follows the original proposal in spirit, but some internal parts are tailored toward practical implementation. Our construction algorithm has time requirement O(n log n log |Σ|) and uses closely the same space as the final structure while constructing it: on the 10MB DNA sequence, the maximum space usage during construction is only 1.5 times the final product size. As by-products, we develop a method to create Succinct Suffix Array directly from Burrows-Wheeler transform and a space-efficient version of the suffixes-insertion algorithm to build balanced parentheses representation of suffix tree from LCP information.