Efficient trie-based sorting of large sets of strings

  • Authors:
  • Ranjan Sinha;Justin Zobel

  • Affiliations:
  • School of Computer Science and Information Technology, RMIT University, GPO Box 2476V, Melbourne 3001, Australia;School of Computer Science and Information Technology, RMIT University, GPO Box 2476V, Melbourne 3001, Australia

  • Venue:
  • ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sorting is a fundamental algorithmic task. Many general-purpose sorting algorithms have been developed, but efficiency gains can be achieved by designing algorithms for specific kinds of data, such as strings. In previous work we have shown that our burstsort, a trie-based algorithm for sorting strings, is for large data sets more efficient than all previous algorithms for this task. In this paper we re-evaluate some of the implementation details of burstsort, in particular the method for managing buckets held at leaves. We show that better choice of data structures further improves the efficiency, at a small additional cost in memory. For sets of around 30,000,000 strings, our improved burstsort is nearly twice as fast as the previous best sorting algorithm.