Principles of database buffer management
ACM Transactions on Database Systems (TODS)
The LRU-K page replacement algorithm for database disk buffering
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Genetic sequence data retrieval and manipulation based on generalized suffix trees
Genetic sequence data retrieval and manipulation based on generalized suffix trees
A comparison of imperative and purely functional suffix tree constructions
ESOP '94 Selected papers of ESOP '94, the 5th European symposium on Programming
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Reducing the space requirement of suffix trees
Software—Practice & Experience
Average Case Analysis of Algorithms on Sequences
Average Case Analysis of Algorithms on Sequences
Index Access with a Finite Buffer
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Suffix Trees (and Relatives) Come of Age in Bioinformatics
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Optimal suffix tree construction with large alphabets
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Overcoming the Memory Bottleneck in Suffix Tree Construction
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
An evaluation of buffer management strategies for relational database systems
VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
Approximate string matching in sublinear expected time
SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
BODHI: a database habitat for bio-diversity information
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Practical methods for constructing suffix trees
The VLDB Journal — The International Journal on Very Large Data Bases
A data structure for a sequence of string accesses in external memory
ACM Transactions on Algorithms (TALG)
Genome-scale disk-based suffix tree indexing
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Practical suffix tree construction
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The SBC-tree: an index for run-length compressed sequences
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
A new method for indexing genomes using on-disk suffix trees
Proceedings of the 17th ACM conference on Information and knowledge management
Serial and parallel methods for i/o efficient suffix tree construction
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Indexing genomic sequences on the IBM Blue Gene
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Suffix tree construction algorithms on modern hardware
Proceedings of the 13th International Conference on Extending Database Technology
I/O efficient algorithms for serial and parallel suffix tree construction
ACM Transactions on Database Systems (TODS)
Search-Optimized suffix-tree storage for biological applications
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Obtaining provably good performance from suffix trees in secondary storage
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Hi-index | 0.01 |
Online persistent suffix tree construction has been consideredimpractical due to its excessive I/O costs. However,these prior studies have not taken into account the effects ofthe buffer management policy and the internal node structureof the suffix tree on I/O behavior of construction andsubsequent retrievals over the tree. In this paper, we studythese two issues in detail in the context of large genomicDNA and Protein sequences. In particular, we make the followingcontributions: (i) a novel, low-overhead bufferingpolicy called TOP-Q which improves the on-disk behaviorof suffix tree construction and subsequent retrievals, and (ii)empirical evidence that the space efficient linked-list representationof suffix tree nodes provides significantly inferiorperformance when compared to the array representation.These results demonstrate that a careful choice ofimplementation strategies can make online persistent suffixtree construction considerably more scalable - in termsof length of sequences indexed with a fixed memory budget,than currently perceived.