The input/output complexity of sorting and related problems
Communications of the ACM
Average-case analysis of algorithms and data structures
Handbook of theoretical computer science (vol. A)
Deterministic distribution sort in shared and distributed memory multiprocessors
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Greed sort: optimal deterministic sorting on parallel disks
Journal of the ACM (JACM)
Simple randomized mergesort on parallel disks
Parallel Computing - Special double issue: parallel I/O
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Balls and bins: a study in negative dependence
Random Structures & Algorithms
Fast concurrent access to parallel disks
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Scalable Sweeping-Based Spatial Join
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Supporting I/O-efficient scientific computation in TPIE
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
Optimal External Memory Interval Management
SIAM Journal on Computing
Duality Between Prefetching and Queued Writing with Parallel Disks
SIAM Journal on Computing
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Hi-index | 0.01 |
Parallel independent disks can enhance the performance of external memory (EM) algorithms, but the programming task is often difficult. Each disk can service only one read or write request at a time; the challenge is to keep the disks as busy as possible. In this article, we develop a randomized allocation discipline for parallel independent disks, called randomized cycling. We show how it can be used as the basis for an efficient distribution sort algorithm, which we call randomized cycling distribution sort (RCD). We prove that the expected I/O complexity of RCD is optimal. The analysis uses a novel reduction to a scenario with significantly fewer probabilistic interdependencies. We demonstrate RCD's practicality by experimental simulations. Using the randomized cycling discipline, algorithms developed for the unrealistic multihead disk model can be simulated on the realistic parallel disk model for the class of multipass algorithms, which make a complete pass through their data before accessing any element a second time. In particular, algorithms based upon the well-known distribution and merge paradigms of EM computation can be optimally extended from a single disk to parallel disks.