Simple randomized mergesort on parallel disks

  • Authors:
  • Rakesh D. Barve;Edward F. Grove;Jeffrey Scott Vitter

  • Affiliations:
  • Dept. of Computer Science, Duke University, Durham, NC;Max-Planck-Institut für Informatik, Im Stadtwald, 66 Saarbrücken, Germany;Dept. of Computer Science, Duke University, Durham, NC

  • Venue:
  • Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of sorting a file of N records on theD-disk model of parallel I/0 [VS94] in which there are two sourcesof parallehsm. Records are transferred to and from diskconcurrently in blocks of B con-tiguous records. In each I/Ooperation, up to one block can be transferred to or from each ofthe D disks in parallel. We propose a simple, eficient, randomizedmergesort algorithm called SRM that uses a forecast-and-flushapproach to overcome the inherent difficulties of simple merging onparallel disks. SRM exhibits a limited use of randomization andalso has a useful deterministic version. Generalizing theforecasting technique of [Knu73], our algorithm, is able to readin, at any time, the right block from any disk, and using thetechnique of flushing, our algorithm evicts, without any I/0overhead, just the right blocks from memory to make space for newones to be read in. The disk layout of SRM is such that it enjoysperfect write parallelism, avoiding fundamental inefficiencies ofprevious mergesort algorithms. Our analysis technique involves anovel reduction to various maximum occupancy problems. We provethat the expected I/O performance of SRM is efficient under varyingsizes of memory and that it compares favorably in practice todisk-striped mergesort (DSM). Our studies indicate that SRMoutperforms DSM even when the number D of parallel disks is fairlysmall.