A bridging model for parallel computation
Communications of the ACM
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Parallel sorting by regular sampling
Journal of Parallel and Distributed Computing
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
On the versatility of parallel sorting by regular sampling
Parallel Computing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Empirical evaluation of the CRAY-T3D: a compiler perspective
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Parallel algorithms for image histogramming and connected components with an experimental study
Journal of Parallel and Distributed Computing
Practical parallel algorithms for personalized communication and integer sorting
Journal of Experimental Algorithmics (JEA)
Deterministic sorting and randomized median finding on the BSP model
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
Designing Practical Efficient Algorithms for Symmetric Multiprocessors
ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Algorithm engineering for parallel computation
Experimental algorithmics
Assembly of Large Genomes from Paired Short Reads
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
SGL: towards a bridging model for heterogeneous hierarchical platforms
International Journal of High Performance Computing and Networking
Comparison based sorting for systems with multiple GPUs
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Bitonic sort on a chained-cubic tree interconnection network
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
We introduce a new deterministic parallel sorting algorithm for distributed memory machines based on the regular sampling approach. The algorithm uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead. Moreover, unlike previous variations, our algorithm efficiently handles the presence of duplicate values without the overhead of tagging each element with a unique identifier. This algorithm was implemented in SPLIT-C and run on a variety of platforms, including the Thinking Machines CM-5, the IBM SP-2-WN, and the Cray Research T3D. We ran our code using widely different benchmarks to examine the dependence of our algorithm on the input distribution. Our experimental results illustrate the efficiency and scalability of our algorithm across different platforms. In fact, the performance compares closely to that of our random sample sort algorithm, which seems to outperform all similar algorithms known to the authors on these platforms. Together, their performance is nearly invariant over the set of input distributions, unlike previous efficient algorithms. However, unlike our randomized sorting algorithm, the performance and memory requirements of our regular sorting algorithm can be deterministically guaranteed.