Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Optimal communication algorithms for regular decompositions on the hypercube
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Communications of the ACM
Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Designing fault-tolerant systems using automorphisms
Journal of Parallel and Distributed Computing
Optimal broadcasting on SIMD hypercubes without indirect addressing capability
Journal of Parallel and Distributed Computing
Designing broadcasting algorithms in the postal model for message-passing systems
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Distributed process groups in the V Kernel
ACM Transactions on Computer Systems (TOCS)
Document for a Standard Message-Passing Interface
Document for a Standard Message-Passing Interface
Efficient algorithms for all-to-all communications in multi-port message-passing systems
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
The communication software and parallel environment of the IBM SP2
IBM Systems Journal
On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model
IEEE Transactions on Parallel and Distributed Systems
Practical parallel algorithms for personalized communication and integer sorting
Journal of Experimental Algorithmics (JEA)
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Modeling parallel bandwidth: local vs. global restrictions
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
Designing Tree-Based Barrier Synchronization on 2D Mesh Networks
IEEE Transactions on Parallel and Distributed Systems
A new deterministic parallel sorting algorithm with an experimental evaluation
Journal of Experimental Algorithmics (JEA)
The undecidability of associativity and commutativity analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Scalability versus execution time in scalable systems
Journal of Parallel and Distributed Computing
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2
IEEE Parallel & Distributed Technology: Systems & Technology
Multiphase Complete Exchange on Paragon, SP2, and CS-2
IEEE Parallel & Distributed Technology: Systems & Technology
A foundation for designing deadlock-free routing algorithms in wormhole networks
Journal of the ACM (JACM)
Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
The Data Mover: A Machine-Independent Abstraction for Managing Customized Data Motion
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Bandwidth Latency Tradeoff for Broadcast and Reduction
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A bandwidth latency tradeoff for broadcast and reduction
Information Processing Letters
Efficient collective communication in distributed heterogeneous systems
Journal of Parallel and Distributed Computing
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Efficient implementation of reduce-scatter in MPI
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Efficient Multiple Multicast on Heterogeneous Network of Workstations
The Journal of Supercomputing
Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Optimizing All-to-All Collective Communication by Exploiting Concurrency in Modern Networks
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A case for coordinated resource management in heterogeneous multicore platforms
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Hi-index | 0.00 |
A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model.Index Terms驴Collective communication algorithms, collective communication semantics, message-passing parallel systems, portable library, process group, tunable algorithms.