External string sorting: faster and cache-oblivious

  • Authors:
  • Rolf Fagerberg;Anna Pagh;Rasmus Pagh

  • Affiliations:
  • University of Southern Denmark, Odense M, Denmark;IT University of Copenhagen, København S, Denmark;IT University of Copenhagen, København S, Denmark

  • Venue:
  • STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We give a randomized algorithm for sorting strings in external memory. For K binary strings comprising N words in total, our algorithm finds the sorted order and the longest common prefix sequence of the strings using $O(\frac{K}{B}log_{M/B}(\frac{K}{M})log(\frac{N}{K}) + \frac{N}{B})$ I/Os. This bound is never worse than $O(\frac{K}{B}log_{M/B}(\frac{K}{M})log log_{M/B}(\frac{K}{M}) + \frac{N}{B})$ I/Os, and improves on the (deterministic) algorithm of Arge et al. (On sorting strings in external memory, STOC '97). The error probability of the algorithm can be chosen as O(N$^{\rm -{\it c}}$) for any positive constant c. The algorithm even works in the cache-oblivious model under the tall cache assumption, i.e,, assuming M B1+ε for some ε 0. An implication of our result is improved construction algorithms for external memory string dictionaries.