STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
On the sorting-complexity of suffix tree construction
Journal of the ACM (JACM)
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
An experimental study of priority queues in external memory
Journal of Experimental Algorithmics (JEA)
Database indexing for large DNA and protein sequence collections
The VLDB Journal — The International Journal on Very Large Data Bases
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
An Experimental Study of Priority Queues in External Memory
WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
LEDA-SM Extending LEDA to Secondary Memory
WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
Indexing Text with Approximate q-Grams
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
A New Indexing Method for Approximate String Matching
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
On Constructing Suffix Arrays in External Memory
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
External Memory Data Structures
ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
External memory data structures
Handbook of massive data sets
Handbook of massive data sets
Accelerating Approximate Subsequence Search on Large Protein Sequence Databases
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Towards Automatic Clustering of Protein Sequences
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Engineering a Fast Online Persistent Suffix Tree Construction
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Constructing Suffix Tree for Gigabyte Sequences with Megabyte Memory
IEEE Transactions on Knowledge and Data Engineering
Linear work suffix array construction
Journal of the ACM (JACM)
Constructing large suffix trees on a computational grid
Journal of Parallel and Distributed Computing
Genome-scale disk-based suffix tree indexing
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Text document clustering based on frequent word meaning sequences
Data & Knowledge Engineering
External Memory Algorithms for String Problems
Fundamenta Informaticae - Workshop on Combinatorial Algorithms
B-tries for disk-based string management
The VLDB Journal — The International Journal on Very Large Data Bases
Serial and parallel methods for i/o efficient suffix tree construction
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
I/O efficient algorithms for serial and parallel suffix tree construction
ACM Transactions on Database Systems (TODS)
Estimating the number of substring matches in long string databases
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Parallel construction of large suffix trees on a PC cluster
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
External Memory Algorithms for String Problems
Fundamenta Informaticae - Workshop on Combinatorial Algorithms
Personal bankruptcy prediction by mining credit card data
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
The suffix tree of a string is the fundamental data structure of string processing. Recent focus on massive data sets has sparked interest in overcoming the memory bottlenecks of known algorithms for building and using suffix trees.Our main contribution is a new algorithm for suffix tree construction in which we choreograph almost all disk accesses to be via the sort and scan primitives. This algorithm achieves optimal results in a variety of sequential and parallel computational models. Two of our results are:1) In the traditional external memory model, in which only the number of disk accesses is counted, we achieve an optimal algorithm, both for single and multiple disk cases. This is the first optimal algorithm known for either model. 2) Traditional disk page access counting does not differentiate between random page accesses and block transfers involving several consecutive pages. This difference is routinely exploited by expert programmers to get fast algorithms on real machines. We adopt a simplweb accounting scheme and show that our algorithm achieves the same optimal tradeoff for block versus random page accesses as the one we establish for sorting.