Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Unique design concepts on GF11 and their impact on performance
IBM Journal of Research and Development
Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
An architecture for optimal all-to-all personalized communication
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Using fine-grain threads and run-time decision making in parallel computing
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
A Unified Framework for Optimizing Communication in Data-Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
MPI-FM: high performance MPI on workstation clusters
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Program transformation and runtime support for threaded MPI execution on shared-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiled communication for all-optical TDM networks
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
OMPI: optimizing MPI programs using partial evaluation
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs
IEEE Parallel & Distributed Technology: Systems & Technology
Virtual-Memory-Mapped Network Interfaces
IEEE Micro
Algorithms for Supporting Compiled Communication
IEEE Transactions on Parallel and Distributed Systems
An Empirical Study of Reliable Multicast Protocols over Ethernet - Connected Networks
ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
A Communication Backend for Parallel Language Compilers
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Static Communications in Parallel Scientific Propgrams
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and Implementation of PVM Version 3
Design and Implementation of PVM Version 3
Compiler directed architecture-dependent communication optimizations
Compiler directed architecture-dependent communication optimizations
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
MPI-aware compiler optimizations for improving communication-computation overlap
Proceedings of the 23rd international conference on Supercomputing
Communication-Sensitive Static Dataflow for Parallel Message Passing Applications
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
CoMPI: configuration of collective operations in LAM/MPI using the scheme programming language
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
An automated approach to improve communication-computation overlap in clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Pipelined broadcast on ethernet switched clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Tolerating message latency through the early release of blocked receives
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Exploiting single-assignment properties to optimize message-passing programs by code transformations
IFL'04 Proceedings of the 16th international conference on Implementation and Application of Functional Languages
Hi-index | 0.00 |
Compiled communication has recently been proposed to improve communication performance for clusters of workstations. The idea of compiled communication is to apply more aggressive optimizations to communications whose information is known at compile time. Existing MPI libraries do not support compiled communication. In this paper, we present an MPI prototype, CC--MPI, that supports compiled communication on Ethernet switched clusters. The unique feature of CC--MPI is that it allows the user to manage network resources such as multicast groups directly and to optimize communications based on the availability of the communication information. CC--MPI optimizes one--to--all, one--to--many, all--to--all, and many--to--many collective communication routines using the compiled communication technique. We describe the techniques used in CC--MPI and report its performance. The results show that communication performance of Ethernet switched clusters can be significantly improved through compiled communication.