STAR-MPI: self tuned adaptive routines for MPI collective operations

Authors:
Ahmad Faraj;Xin Yuan;David Lowenthal
Affiliations:
Florida State University, Tallahassee, FL;Florida State University, Tallahassee, FL;University of Georgia, Athens, GA
Venue:
Proceedings of the 20th annual international conference on Supercomputing
Year:
2006

Citing 13
Cited 16

MPI-FM: high performance MPI on workstation clusters

Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of MPI collectives on clusters of large-scale SMP's

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Program transformation and runtime support for threaded MPI execution on shared-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
OMPI: optimizing MPI programs using partial evaluation

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Message Scheduling for All-to-All Personalized Communication on Ethernet Switched Clusters

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Automatic generation and tuning of MPI collective communication routines

Proceedings of the 19th annual international conference on Supercomputing
Pipelined broadcast on ethernet switched clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

A comparison of online and offline strategies for program adaptation

ACM-SE 45 Proceedings of the 45th annual southeast regional conference
A study of process arrival patterns for MPI collective operations

Proceedings of the 21st annual international conference on Supercomputing
MPI Reduction Operations for Sparse Floating-point Data

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Bandwidth efficient all-to-all broadcast on switched clusters

International Journal of Parallel Programming
Communication-Sensitive Static Dataflow for Parallel Message Passing Applications

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A study of process arrival patterns for MPI collective operations

International Journal of Parallel Programming
Optimizing MPI Runtime Parameter Settings by Using Machine Learning

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Architecture of the Component Collective Messaging Interface

International Journal of High Performance Computing Applications
Measuring execution times of collective communications in an empirical optimization framework

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Reining in the outliers in map-reduce clusters using Mantri

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems

Computer Science - Research and Development
pupyMPI - MPI implemented in pure python

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Analyzing fault aware collective performance in a process fault tolerant MPI

Parallel Computing
Self-adaptive hints for collective i/o

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Optimization principles for collective neighborhood communications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Message Passing Interface (MPI) collective communication routines are widely used in parallel applications. In order for a collective communication routine to achieve high performance for different applications on different platforms, it must be adaptable to both the system architecture and the application workload. Current MPI implementations do not support such software adaptability and are not able to achieve high performance on many platforms. In this paper, we present STAR-MPI (Self Tuned Adaptive Routines for MPI collective operations), a set of MPI collective communication routines that are capable of adapting to system architecture and application workload. For each operation, STAR-MPI maintains a set of communication algorithms that can potentially be efficient at different situations. As an application executes, a STAR-MPI routine applies the Automatic Empirical Optimization of Software (AEOS) technique at run time to dynamically select the best performing algorithm for the application on the platform. We describe the techniques used in STAR-MPI, analyze STAR-MPI overheads, and evaluate the performance of STAR-MPI with applications and benchmarks. The results of our study indicate that STAR-MPI is robust and efficient. It is able to and efficient algorithms with reasonable overheads, and it out-performs traditional MPI implementations to a large degree in many cases.