Distribution sort with randomized cycle

  • Authors:
  • Jeffrey Scott Vitter;David A. Hutchinson

  • Affiliations:
  • Department of Computer Science, Duke University, Durham, NC;Department of Computer Science, Duke University, Durham, NC

  • Venue:
  • SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallel independent disks can enhance the performance of external memory (EM) algorithms, but the programming task is often difficult. In this paper we develop randomized variants of distribution sort for use with parallel independent disks. We propose a simple variant called randomized cycling distribution sort (RCD) and prove that it has optimal expected I/O complexity. The analysis uses a novel reduction to a model with significantly fewer probabilistic interdependencies. Experimental evidence is provided to support its practicality. Other simple variants are also examined experimentally and appear to offer similar advantages to RCD. Based upon ideas in RCD we propose general techniques that transparently simulate algorithms developed for the unrealistic multihead disk model so that they can be run on the realistic parallel disk model. The simulation is optimal for two important classes of algorithms; the class of multipass algorithms, which make a complete pass through their data before accessing any element a second time, and the algorithms based upon the well-known distribution paradigm of EM computation.