Communication-Efficient Parallel Sorting

Authors:
Michael T. Goodrich
Affiliations:
-
Venue:
SIAM Journal on Computing
Year:
1999

Citing 0
Cited 9

Linear work suffix array construction

Journal of the ACM (JACM)
Fundamental parallel algorithms for private-cache chip multiprocessors

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Efficient parallel Text Retrieval techniques on Bulk Synchronous Parallel (BSP)/Coarse Grained Multicomputers (CGM)

The Journal of Supercomputing
Simple linear work suffix array construction

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Towards optimizing energy costs of algorithms for shared memory architectures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Sorting, searching, and simulation in the mapreduce framework

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Space-round tradeoffs for MapReduce computations

Proceedings of the 26th ACM international conference on Supercomputing
A lower bound technique for communication on BSP with application to the FFT

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Optimal deterministic routing and sorting on the congested clique

Proceedings of the 2013 ACM symposium on Principles of distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of sorting n numbers on a p-processor bulk-synchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processor-to-processor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is $O({n\log n \over p})$ and a number of communication rounds that is $O({\log n \over \log (h+1)})$ for $h=\Theta(n/p)$. The internal computation bound is optimal for any comparison-based sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when $p\le n^{1-{1/c}}$ for a constant $c\ge 1$. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n/h) of an arbitrary number of processors in a BSP computer requires $\Omega(\log n/\log (h+1))$ communication rounds.