Unique design concepts on GF11 and their impact on performance
IBM Journal of Research and Development
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Using fine-grain threads and run-time decision making in parallel computing
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
MPI-FM: high performance MPI on workstation clusters
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Program transformation and runtime support for threaded MPI execution on shared-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiled communication for all-optical TDM networks
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
OMPI: optimizing MPI programs using partial evaluation
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Algorithms for Supporting Compiled Communication
IEEE Transactions on Parallel and Distributed Systems
An Empirical Study of Reliable Multicast Protocols over Ethernet - Connected Networks
ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Compiler directed architecture-dependent communication optimizations
Compiler directed architecture-dependent communication optimizations
An empirical study of reliable multicast protocols over Ethernet-connected networks
Performance Evaluation
A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters
IEEE Transactions on Parallel and Distributed Systems
Techniques for pipelined broadcast on ethernet switched clusters
Journal of Parallel and Distributed Computing
Bandwidth optimal all-reduce algorithms for clusters of workstations
Journal of Parallel and Distributed Computing
Bandwidth efficient all-to-all broadcast on switched clusters
International Journal of Parallel Programming
A study of process arrival patterns for MPI collective operations
International Journal of Parallel Programming
Message scheduling for array re-decomposition on distributed memory systems
Future Generation Computer Systems
Automatic and transparent optimizations of an application's MPI communication
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Contention-free communication scheduling for group communication in data parallelism
OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
A Two-Level Scheduling Strategy for optimising communications of data parallel programs in clusters
International Journal of Ad Hoc and Ubiquitous Computing
A compound scheduling strategy for irregular array redistribution in cluster based parallel system
MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
Globe'11 Proceedings of the 4th international conference on Data management in grid and peer-to-peer systems
Resource management framework for collaborative computing systems over multiple virtual machines
Service Oriented Computing and Applications
Improved GROMACS scaling on ethernet switched clusters
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Message clustering technique towards efficient irregular data redistribution in clusters and grids
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Compiled communication has recently been proposed to improve communication performance for clusters of workstations. The idea of compiled communication is to apply more aggressive optimizations to communications whose information is known at compile time. Existing MPI libraries do not support compiled communication. In this paper, we present an MPI prototype, CC-MPI, that supports compiled communication on Ethernet switched clusters. The unique feature of CC-MPI is that it allows the user to manage network resources such as multicast groups directly and to optimize communications based on the availability of the communication information. CC-MPI optimizes one-to-all, one-to-many, all-to-all, and many-to-many collective communication routines using the compiled communication technique. We describe the techniques used in CC-MPI and report its performance.The results show that communication performance of Ethernet switched clusters can be significantly improved through compiled communication.