MPI-FM: high performance MPI on workstation clusters
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of MPI collectives on clusters of large-scale SMP's
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Program transformation and runtime support for threaded MPI execution on shared-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
OMPI: optimizing MPI programs using partial evaluation
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Framework for Collective Personalized Communication
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Message Scheduling for All-to-All Personalized Communication on Ethernet Switched Clusters
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
Low Diameter Interconnections for Routing in High-Performance Parallel Systems
IEEE Transactions on Computers
A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters
IEEE Transactions on Parallel and Distributed Systems
An efficient MPI_allgather for grids
Proceedings of the 16th international symposium on High performance distributed computing
A study of process arrival patterns for MPI collective operations
Proceedings of the 21st annual international conference on Supercomputing
Techniques for pipelined broadcast on ethernet switched clusters
Journal of Parallel and Distributed Computing
Proceedings of the 22nd annual international conference on Supercomputing
Bandwidth efficient all-to-all broadcast on switched clusters
International Journal of Parallel Programming
A study of process arrival patterns for MPI collective operations
International Journal of Parallel Programming
Modeling advanced collective communication algorithms on cell-based systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Automatic and transparent optimizations of an application's MPI communication
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Pipelined broadcast on ethernet switched clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hiding latency in Coarray Fortran 2.0
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Computers and Electrical Engineering
Automatic performance optimization of the discrete fourier transform on distributed memory computers
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Performance analysis and optimization of MPI collective operations on multi-core clusters
The Journal of Supercomputing
Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
In order for collective communication routines to achieve high performance on different platforms, they must be able to adapt to the system architecture and use different algorithms for different situations. Current Message Passing Interface (MPI) implementations, such as MPICH and LAM/MPI, are not fully adaptable to the system architecture and are not able to achieve high performance on many platforms. In this paper, we present a system that produces efficient MPI collective communication routines. By automatically generating topology specific routines and using an empirical approach to select the best implementations, our system adapts to a given platform and constructs routines that are customized for the platform. The experimental results show that the tuned routines consistently achieve high performance on clusters with different network topologies.