Linear work suffix array construction
Journal of the ACM (JACM)
Fundamental parallel algorithms for private-cache chip multiprocessors
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
The Journal of Supercomputing
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Towards optimizing energy costs of algorithms for shared memory architectures
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Sorting, searching, and simulation in the mapreduce framework
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Space-round tradeoffs for MapReduce computations
Proceedings of the 26th ACM international conference on Supercomputing
A lower bound technique for communication on BSP with application to the FFT
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Optimal deterministic routing and sorting on the congested clique
Proceedings of the 2013 ACM symposium on Principles of distributed computing
Hi-index | 0.00 |
We study the problem of sorting n numbers on a p-processor bulk-synchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processor-to-processor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is $O({n\log n \over p})$ and a number of communication rounds that is $O({\log n \over \log (h+1)})$ for $h=\Theta(n/p)$. The internal computation bound is optimal for any comparison-based sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when $p\le n^{1-{1/c}}$ for a constant $c\ge 1$. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n/h) of an arbitrary number of processors in a BSP computer requires $\Omega(\log n/\log (h+1))$ communication rounds.