Algorithms in C
Software—Practice & Experience
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Splaysort: fast, versatile, practical
Software—Practice & Experience
On sorting strings in external memory (extended abstract)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Journal of Experimental Algorithmics (JEA)
Fast algorithms for sorting and searching strings
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Burst tries: a fast, efficient data structure for string keys
ACM Transactions on Information Systems (TOIS)
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Cache-conscious sorting of large sets of strings with dynamic tries
Journal of Experimental Algorithmics (JEA)
Introduction to the Design and Analysis of Algorithms (2nd Edition)
Introduction to the Design and Analysis of Algorithms (2nd Edition)
Implementing sorting in database systems
ACM Computing Surveys (CSUR)
Cache-oblivious string B-trees
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using random sampling to build approximate tries for efficient string sorting
Journal of Experimental Algorithmics (JEA)
Cache-efficient string sorting using copying
Journal of Experimental Algorithmics (JEA)
Engineering a cache-oblivious sorting algorithm
Journal of Experimental Algorithmics (JEA)
Comparing integer data structures for 32- and 64-bit keys
Journal of Experimental Algorithmics (JEA)
Hi-index | 0.00 |
Burstsort is a trie-based string sorting algorithm that distributes strings into small buckets whose contents are then sorted in cache. This approach has earlier been demonstrated to be efficient on modern cache-based processors [Sinha & Zobel, JEA 2004]. In this paper, we introduce improvements that reduce by a significant margin the memory requirements of burstsort. Excess memory has been reduced by an order of magnitude so that it is now less than 1% greater than an in-place algorithm. These techniques can be applied to existing variants of burstsort, as well as other string algorithms. We redesigned the buckets, introducing sub-buckets and an index structure for them, which resulted in an order-of-magnitude space reduction. We also show the practicality of moving some fields from the trie nodes to the insertion point (for the next string pointer) in the bucket; this technique reduces memory usage of the trie nodes by one-third. Significantly, the overall impact on the speed of burstsort by combining these memory usage improvements is not unfavourable on real-world string collections. In addition, during the bucket-sorting phase, the string suffixes are copied to a small buffer to improve their spatial locality, lowering the running time of burstsort by up to 30%.