Using random sampling to build approximate tries for efficient string sorting

Authors:
Ranjan Sinha;Justin Zobel
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia
Venue:
Journal of Experimental Algorithmics (JEA)
Year:
2005

Citing 14
Cited 4

On randomization in sequential and distributed algorithms

ACM Computing Surveys (CSUR)
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Randomized algorithms

Randomized algorithms
On sorting strings in external memory (extended abstract)

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Implementing radixsort

Journal of Experimental Algorithmics (JEA)
Results and challenges in Web search evaluation

WWW '99 Proceedings of the eighth international conference on World Wide Web
The influence of caches on the performance of sorting

Journal of Algorithms
Fast algorithms for sorting and searching strings

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Improving memory performance of sorting algorithms

Journal of Experimental Algorithmics (JEA)
Burst tries: a fast, efficient data structure for string keys

ACM Transactions on Information Systems (TOIS)
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Adapting Radix Sort to the Memory Hierarchy

Journal of Experimental Algorithmics (JEA)
Cache-conscious sorting of large sets of strings with dynamic tries

Journal of Experimental Algorithmics (JEA)

Comparing integer data structures for 32- and 64-bit keys

Journal of Experimental Algorithmics (JEA)
Engineering burstsort: Toward fast in-place string sorting

Journal of Experimental Algorithmics (JEA)
Engineering burstsort: towards fast in-place string sorting

WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Comparing integer data structures for 32 and 64 bit keys

WEA'08 Proceedings of the 7th international conference on Experimental algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string-sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cache efficient. Burstsort dynamically builds a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SR-burstsort, DR-burstsort, and DRL-burstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce, by up to 37%, cache misses further than did the original burstsort, while simultaneously reducing instruction counts by up to 24%. In pathological cases, even further savings can be obtained.