An MPI prototype for compiled communication on Ethernet switched clusters

Authors:
Amit Karwande;Xin Yuan;David K. Lowenthal
Affiliations:
Department of Computer Science, Florida State University, Tallahassee, FL 32306, USA;Department of Computer Science, Florida State University, Tallahassee, FL 32306, USA;Department of Computer Science, University of Georgia, Athen, GA 30602, USA
Venue:
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Year:
2005

Citing 15
Cited 16

Unique design concepts on GF11 and their impact on performance

IBM Journal of Research and Development
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Using fine-grain threads and run-time decision making in parallel computing

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
MPI-FM: high performance MPI on workstation clusters

Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Program transformation and runtime support for threaded MPI execution on shared-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiled communication for all-optical TDM networks

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
OMPI: optimizing MPI programs using partial evaluation

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Algorithms for Supporting Compiled Communication

IEEE Transactions on Parallel and Distributed Systems
An Empirical Study of Reliable Multicast Protocols over Ethernet - Connected Networks

ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
Toward high communication performance through compiled communications on a circuit switched interconnection network

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Compiler directed architecture-dependent communication optimizations

Compiler directed architecture-dependent communication optimizations

An empirical study of reliable multicast protocols over Ethernet-connected networks

Performance Evaluation
A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters

IEEE Transactions on Parallel and Distributed Systems
Techniques for pipelined broadcast on ethernet switched clusters

Journal of Parallel and Distributed Computing
Bandwidth optimal all-reduce algorithms for clusters of workstations

Journal of Parallel and Distributed Computing
Bandwidth efficient all-to-all broadcast on switched clusters

International Journal of Parallel Programming
A study of process arrival patterns for MPI collective operations

International Journal of Parallel Programming
Message scheduling for array re-decomposition on distributed memory systems

Future Generation Computer Systems
Automatic and transparent optimizations of an application's MPI communication

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Contention-free communication scheduling for group communication in data parallelism

OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
A Two-Level Scheduling Strategy for optimising communications of data parallel programs in clusters

International Journal of Ad Hoc and Ubiquitous Computing
A compound scheduling strategy for irregular array redistribution in cluster based parallel system

MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
CAD: an efficient data management and migration scheme across clouds for data-intensive scientific applications

Globe'11 Proceedings of the 4th international conference on Data management in grid and peer-to-peer systems
Resource management framework for collaborative computing systems over multiple virtual machines

Service Oriented Computing and Applications
Improved GROMACS scaling on ethernet switched clusters

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Message clustering technique towards efficient irregular data redistribution in clusters and grids

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compiled communication has recently been proposed to improve communication performance for clusters of workstations. The idea of compiled communication is to apply more aggressive optimizations to communications whose information is known at compile time. Existing MPI libraries do not support compiled communication. In this paper, we present an MPI prototype, CC-MPI, that supports compiled communication on Ethernet switched clusters. The unique feature of CC-MPI is that it allows the user to manage network resources such as multicast groups directly and to optimize communications based on the availability of the communication information. CC-MPI optimizes one-to-all, one-to-many, all-to-all, and many-to-many collective communication routines using the compiled communication technique. We describe the techniques used in CC-MPI and report its performance.The results show that communication performance of Ethernet switched clusters can be significantly improved through compiled communication.