Performance evaluation of adaptive MPI

Authors:
Chao Huang;Gengbin Zheng;Laxmikant Kalé;Sameer Kumar
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2006

Citing 9
Cited 25

CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Processor allocation in multiprogrammed distributed-memory parallel computer systems

Journal of Parallel and Distributed Computing
Run-Time Support for Adaptive Load Balancing

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Framework for Collective Personalized Communication

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Scaling All-to-All Multicast on Fat-tree Networks

ICPADS '04 Proceedings of the Parallel and Distributed Systems, Tenth International Conference
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Achieving high performance on extremely large parallel machines: performance prediction and load balancing

Achieving high performance on extremely large parallel machines: performance prediction and load balancing

Performance evaluation of automatic checkpoint-based fault tolerance for AMPI and Charm++

ACM SIGOPS Operating Systems Review
Supporting dynamic migration in tightly coupled grid applications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Charisma: orchestrating migratable parallel objects

Proceedings of the 16th international symposium on High performance distributed computing
A Case Study in Tightly Coupled Multi-paradigm Parallel Programming

Languages and Compilers for Parallel Computing
Parallel Simulations of Dynamic Fracture Using Extrinsic Cohesive Elements

Journal of Scientific Computing
Interconnect agnostic checkpoint/restart in open MPI

Proceedings of the 18th ACM international symposium on High performance distributed computing
Applying Processes Rescheduling over Irregular BSP Application

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Satin: A high-level and efficient grid programming model

ACM Transactions on Programming Languages and Systems (TOPLAS)
A new technique for data privatization in user-level threads and its use in parallel applications

Proceedings of the 2010 ACM Symposium on Applied Computing
Team-Based Message Logging: Preliminary Results

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Robust non-intrusive record-replay with processor extraction

Proceedings of the 8th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
Support for adaptivity in ARMCI using migratable objects

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Applying process migration on a BSP-based LU decomposition application

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Debugging large scale applications in a virtualized environment

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Concurrent threads for parallel electron tomography

ACS'06 Proceedings of the 6th WSEAS international conference on Applied computer science
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface

International Journal of High Performance Systems Architecture
Implementation of a green power management algorithm for virtual machines on cloud computing

UIC'11 Proceedings of the 8th international conference on Ubiquitous intelligence and computing
Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids

Journal of Grid Computing
Avoiding hot-spots on two-level direct networks

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Proactive process-level live migration and back migration in HPC environments

Journal of Parallel and Distributed Computing
Automatic resource-centric process migration for MPI

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Preserving the original MPI semantics in a virtualized processor environment

Science of Computer Programming
ACR: automatic checkpoint/restart for soft and hard error protection

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors

ACM Transactions on Architecture and Code Optimization (TACO)
Multi-criteria checkpointing strategies: response-time versus resource utilization

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Processor virtualization via migratable objects is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports migratable objects. This paper describes Adaptive MPI (or AMPI), an MPI implementation and extension, that supports processor virtualization. AMPI implements virtual MPI processes (VPs), several of which may be mapped to a single physical processor. AMPI includes a powerful runtime support system that takes advantage of the degree of freedom afforded by allowing it to assign VPs onto processors. With this runtime system, AMPI supports such features as automatic adaptive overlapping of communication and computation, automatic load balancing, flexibility of running on arbitrary number of processors, and checkpoint/restart support. It also inherits communication optimization from Charm++ framework. This paper describes AMPI, illustrates its performance benefits through a series of benchmarks, and shows that AMPI is a portable and mature MPI implementation that offers various performance benefits to dynamic applications.