LEDA: a platform for combinatorial and geometric computing
Communications of the ACM
Efficient implementation of suffix trees
Software—Practice & Experience
A fully-dynamic data structure for external substring search
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Guidelines for presentation and comparison of indexing techniques
ACM SIGMOD Record
PAT-tree-based keyword extraction for Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
On sorting strings in external memory (extended abstract)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Efficient suffix trees on secondary storage
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Distributed Generation of Suffix Arrays
CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
Optimal suffix tree construction with large alphabets
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Overcoming the Memory Bottleneck in Suffix Tree Construction
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Supporting I/O-efficient scientific computation in TPIE
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
An experimental study of priority queues in external memory
Journal of Experimental Algorithmics (JEA)
LEDA-SM Extending LEDA to Secondary Memory
WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
External Memory Data Structures
ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Searching large text collections
Handbook of massive data sets
External memory data structures
Handbook of massive data sets
Better external memory suffix array construction
Journal of Experimental Algorithmics (JEA)
B-tries for disk-based string management
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.01 |
The construction of full-text indexes on very large text collections is nowadays a hot problem. The suffix array [16] is one of the most attractive full-text indexing data structures due to its simplicity, space efficiency and powerful/fast search operations supported. In this paper we analyze theoretically and experimentally, the I/O-complexity and the working space of six algorithms for constructing large suffix arrays. Additionally, we design a new external-memory algorithm that follows the basic philosophy underlying the algorithm in [13] but in a significantly different manner, thus combining its good practical qualities with efficient worstcase performances. At the best of our knowledge, this is the first study which provides a wide spectrum of possible approaches to the construction of suffix arrays in external memory, and thus it should be helpful to anyone who is interested in building full-text indexes on very large text collections.