Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
A bridging model for parallel computation
Communications of the ACM
Optimal communication algorithms for hypercubes
Journal of Parallel and Distributed Computing
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Integer sorting on a mesh-connected array of processors
Information Processing Letters
The block distributed memory model
The block distributed memory model
Direct bulk-synchronous parallel algorithms
Journal of Parallel and Distributed Computing
Complete exchange on the CM-5 and Touchstone Delta
The Journal of Supercomputing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Practical parallel algorithms for dynamic data redistribution, median finding, and selection (preliminary draft)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Empirical evaluation of the CRAY-T3D: a compiler perspective
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Parallel algorithms for image histogramming and connected components with an experimental study
Journal of Parallel and Distributed Computing
Derandomizing algorithms for routing and sorting on meshes
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Balanced Parallel Sort on Hypercube Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
The Block Distributed Memory Model for Shared Memory Multiprocessors
Proceedings of the 8th International Symposium on Parallel Processing
All-to-All Communication on Meshes with Wormhole Routing
Proceedings of the 8th International Symposium on Parallel Processing
Efficient communication using total-exchange
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Efficient Communication in the Folded Petersen Interconnection Network
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Congestion-Free Routing on the CM-5 Data Router
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Many-to-many personalized communication with bounded traffic
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
An Architecture for Optimal All-to-All Personalized Communication
An Architecture for Optimal All-to-All Personalized Communication
Modeling Parallel Sorts with LogP on the CM-5
Modeling Parallel Sorts with LogP on the CM-5
Routing and Sorting on Meshes with Row and Column Buses
Routing and Sorting on Meshes with Row and Column Buses
On the design and analysis of practical parallel algorithms for combinatorial problems with applications to image processing
A new deterministic parallel sorting algorithm with an experimental evaluation
Journal of Experimental Algorithmics (JEA)
Optimizing Parallel Bitonic Sort
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Buckets Strike Back: Improved Parallel Shortest Paths
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Algorithm engineering for parallel computation
Experimental algorithmics
Fine-Grained Data Distribution Operations for Particle Codes
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Experiments with a parallel external memory system
HiPC'07 Proceedings of the 14th international conference on High performance computing
Active pebbles: parallel programming for data-driven applications
Proceedings of the international conference on Supercomputing
Hi-index | 0.00 |
A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become easier to design efficient and portable parallel algorithms by making use of these communication primitives. While existing primitives allow an assortment of collective communication routines, they do not handle an important communication event when most or all processors have non-uniformly sized personalized messages to exchange with each other. We focus in this paper on the h-relation personalized communication whose efficient implementation will allow high performance implementations of a large class of algorithms. While most previous h-relation algorithms use randomization, this paper presents a new deterministic approach for h-relation personalized communication with asymptotically optimal complexity for hp2. As an application, we present an efficient algorithm for stable integer sorting. The algorithms presented in this paper have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, and the Intel Paragon. Our experimental results are consistent with the theoretical analysis and illustrate the scalability and efficiency of our algorithms across different platforms. In fact, they seem to outperform all similar algorithms known to the authors on these platforms.