Load balanced parallel radix sort
ICS '98 Proceedings of the 12th international conference on Supercomputing
Tolerating communication latency through dynamic thread invocation in a multithreaded architecture
Compiler optimizations for scalable parallel systems
Partitioned parallel radix sort
Journal of Parallel and Distributed Computing
A Software Design Model for Parallel Applications on Heterogeneous Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Partitioned Parallel Radix Sort
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Research note: On the assessment of input streams for incremental network computing
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Overlapping computation with communication is central to obtaining high performance on distributed-memory multiprocessors. This report explicates the overlapping capability of two distributed-memory multiprocessors: the EM-X and IBM SP-2. The well-known bitonic sorting algorithm is selected for experiments. Various message sizes are used to determine when, where, how much and why overlapping takes place. Experimental results indicate that both multiprocessors would yield up to 30% to 40% overlap of communication time when the message size is approximately 1K integers. EM-X is found message-size insensitive yielding high overlap for various message sizes while SP-2 was effective for the window of message size 512 to 2K integers.