Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Splaysort: fast, versatile, practical
Software—Practice & Experience
On sorting strings in external memory (extended abstract)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Journal of Experimental Algorithmics (JEA)
Fast algorithms for sorting and searching strings
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Burst tries: a fast, efficient data structure for string keys
ACM Transactions on Information Systems (TOIS)
Cache oblivious search trees via binary trees of small height
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Algorithms in C
Implementing database operations using SIMD instructions
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Cache-conscious sorting of large sets of strings with dynamic tries
Journal of Experimental Algorithmics (JEA)
Implementing sorting in database systems
ACM Computing Surveys (CSUR)
Cache-oblivious string B-trees
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using random sampling to build approximate tries for efficient string sorting
Journal of Experimental Algorithmics (JEA)
Cache-efficient string sorting using copying
Journal of Experimental Algorithmics (JEA)
Cache Efficient Radix Sort for String Sorting
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Engineering a cache-oblivious sorting algorithm
Journal of Experimental Algorithmics (JEA)
HAT-trie: a cache-conscious trie-based data structure for strings
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
Cache-oblivious databases: Limitations and opportunities
ACM Transactions on Database Systems (TODS)
Engineering Radix Sort for Strings
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Comparing integer data structures for 32 and 64 bit keys
WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Cache-Conscious collision resolution in string hash tables
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
Burstsort is a trie-based string sorting algorithm that distributes strings into small buckets whose contents are then sorted in cache. This approach has earlier been demonstrated to be efficient on modern cache-based processors [Sinha & Zobel, JEA 2004]. In this article, we introduce improvements that reduce by a significant margin the memory requirement of Burstsort: It is now less than 1% greater than an in-place algorithm. These techniques can be applied to existing variants of Burstsort, as well as other string algorithms such as for string management. We redesigned the buckets, introducing sub-buckets and an index structure for them, which resulted in an order-of-magnitude space reduction. We also show the practicality of moving some fields from the trie nodes to the insertion point (for the next string pointer) in the bucket; this technique reduces memory usage of the trie nodes by one-third. Importantly, the trade-off for the reduction in memory use is only a very slight increase in the running time of Burstsort on real-world string collections. In addition, during the bucket-sorting phase, the string suffixes are copied to a small buffer to improve their spatial locality, lowering the running time of Burstsort by up to 30%. These memory usage enhancements have enabled the copy-based approach [Sinha et al., JEA 2006] to also reduce the memory usage with negligible impact on speed.