CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

Authors:
Vasanth Bala;Jehoshua Bruck;Robert Cypher;Pablo Elustando;Alex Ho;Ching-Tien Ho;Shlomo Kipnis;Marc Snir
Affiliations:
-;-;-;-;-;-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1995

Citing 10
Cited 27

Solving problems on concurrent processors. Vol. 1: General techniques and regular problems

Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Optimal communication algorithms for regular decompositions on the hypercube

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Linda in context

Communications of the ACM
Optimum Broadcasting and Personalized Communication in Hypercubes

IEEE Transactions on Computers
Designing fault-tolerant systems using automorphisms

Journal of Parallel and Distributed Computing
Optimal broadcasting on SIMD hypercubes without indirect addressing capability

Journal of Parallel and Distributed Computing
Designing broadcasting algorithms in the postal model for message-passing systems

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Distributed process groups in the V Kernel

ACM Transactions on Computer Systems (TOCS)
Document for a Standard Message-Passing Interface

Document for a Standard Message-Passing Interface

Efficient algorithms for all-to-all communications in multi-port message-passing systems

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
The communication software and parallel environment of the IBM SP2

IBM Systems Journal
On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model

IEEE Transactions on Parallel and Distributed Systems
Practical parallel algorithms for personalized communication and integer sorting

Journal of Experimental Algorithmics (JEA)
Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract)

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Modeling parallel bandwidth: local vs. global restrictions

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
Designing Tree-Based Barrier Synchronization on 2D Mesh Networks

IEEE Transactions on Parallel and Distributed Systems
A new deterministic parallel sorting algorithm with an experimental evaluation

Journal of Experimental Algorithmics (JEA)
The undecidability of associativity and commutativity analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Scalability versus execution time in scalable systems

Journal of Parallel and Distributed Computing
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2

IEEE Parallel & Distributed Technology: Systems & Technology
Multiphase Complete Exchange on Paragon, SP2, and CS-2

IEEE Parallel & Distributed Technology: Systems & Technology
Collective Communication in Wormhole-Routed Massively Parallel Computers

Computer
A foundation for designing deadlock-free routing algorithms in wormhole networks

Journal of the ACM (JACM)
Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
The Data Mover: A Machine-Independent Abstraction for Managing Customized Data Motion

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Bandwidth Latency Tradeoff for Broadcast and Reduction

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A bandwidth latency tradeoff for broadcast and reduction

Information Processing Letters
Efficient collective communication in distributed heterogeneous systems

Journal of Parallel and Distributed Computing
A Case for Aggregate Networks

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Efficient implementation of reduce-scatter in MPI

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Efficient Multiple Multicast on Heterogeneous Network of Workstations

The Journal of Supercomputing
Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Optimizing All-to-All Collective Communication by Exploiting Concurrency in Modern Networks

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Efficient Adaptive Algorithms for Transposing Small and Large Matrices on Symmetric Multiprocessors

Informatica
A case for coordinated resource management in heterogeneous multicore platforms

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model.Index Terms驴Collective communication algorithms, collective communication semantics, message-passing parallel systems, portable library, process group, tunable algorithms.