New indices for text: PAT Trees and PAT arrays
Information retrieval
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Fast string searching in secondary storage: theoretical developments and experimental results
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Efficient suffix trees on secondary storage
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
On the sorting-complexity of suffix tree construction
Journal of the ACM (JACM)
Database indexing for large DNA and protein sequence collections
The VLDB Journal — The International Journal on Very Large Data Bases
Optimal suffix tree construction with large alphabets
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Engineering a Fast Online Persistent Suffix Tree Construction
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Practical suffix tree construction
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Search-Optimized suffix-tree storage for biological applications
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
ACM Computing Surveys (CSUR)
B-tries for disk-based string management
The VLDB Journal — The International Journal on Very Large Data Bases
Engineering a compressed suffix tree implementation
Journal of Experimental Algorithmics (JEA)
Optimal self-adjusting trees for dynamic string data in secondary storage
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
On the weak prefix-search problem
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
On the weak prefix-search problem
Theoretical Computer Science
Hi-index | 0.00 |
Designing external memory data structures for string data-bases is of significant recent interest due to the proliferation of biological sequence data. The suffix tree is an important indexing structure that provides optimal algorithms for memory bound data. However, string B-trees provide the best known asymptotic performance in external memory for substring search and update operations. Work on external memory variants of suffix trees has largely focused on constructing suffix trees in external memory or layout schemes for suffix trees that preserve link locality. In this paper, we present a new suffix tree layout scheme for secondary storage and present construction, substring search, insertion and deletion algorithms that are competitive with the string B-tree. For a set of strings of total length n, a pattern p and disk blocks of size B, we provide a substring search algorithm that uses O(|p|/B + logBn) disk accesses. We present algorithms for insertion and deletion of all suffixes of a string of length m that take O(m logB (n+m)) and O(mlogBn) disk accesses, respectively. Our results demonstrate that suffix trees can be directly used as efficient secondary storage data structures for string and sequence data.