Optimal parallel merging and sorting without memory conflicts
IEEE Transactions on Computers
Optimal merging and sorting on the EREW PRAM
Information Processing Letters
Parallel algorithms for merging and sorting
Information Sciences: an International Journal
An optimal parallel algorithm for merging using multiselection
Information Processing Letters
Parallel Computing
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
Understanding the formation of wait states in applications with one-sided communication
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
We study different approaches to implement an optimal, stable two-way merge algorithm for distributed-memory parallel architectures. The algorithm takes as input two ordered sequences, which are distributed blockwise across all available processes such that each process owns a block of elements of each sequence. The task for each process is to produce an ordered block of elements from the stable merge of the input sequences. We present an optimal, perfectly load-balanced, stable parallel algorithm that accomplishes this task. We describe three different implementation alternatives using one-sided communication of the Message-Passing Interface (MPI). Further, we discuss problematic issues with the current MPI 2.2 one-sided interface and enabling features that may be found in future versions of the MPI standard. Experimental results on a large IBM Blue Gene/P supercomputer show perfect scalability of our implementation: with a fixed input size per process the running time remains (almost) constant with increasing number of processes, and with a fixed total problem size our implementation improves the time to solution for up to 32,768 MPI processes.