Energy saving strategies for parallel applications with point-to-point communication phases

Authors:
Vaibhav Sundriyal;Masha Sosonkina;Alexander Gaenko;Zhao Zhang
Affiliations:
-;-;-;-
Venue:
Journal of Parallel and Distributed Computing
Year:
2013

Citing 26
Cited 0

The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
General atomic and molecular electronic structure system

Journal of Computational Chemistry
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
From trace generation to visualization: a performance framework for distributed parallel systems

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatic Profiling of MPI Applications with Hardware Performance Counters

Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Enabling the Efficient Use of SMP Clusters: The GAMESS/DDI Model

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scheduling Processor Voltage and Frequency in Server and Cluster Systems

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Using multiple energy gears in MPI programs on a power-scalable cluster

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Mitigating Amdahl's Law through EPI Throttling

Proceedings of the 32nd annual international symposium on Computer Architecture
A Power-Aware Run-Time System for High-Performance Computing

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters

ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Bounding energy consumption in large-scale MPI programs

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Prediction models for multi-dimensional power-performance optimization on many cores

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Evaluating high performance communication: a power perspective

Proceedings of the 23rd international conference on Supercomputing
Adagio: making DVS practical for complex HPC applications

Proceedings of the 23rd international conference on Supercomputing
A feasibility analysis of power-awareness and energy minimization in modern interconnects for high-performance computing

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Energy Profiling and Analysis of the HPC Challenge Benchmarks

International Journal of High Performance Computing Applications
Energy-Efficient Cluster Computing via Accurate Workload Characterization

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications

IEEE Transactions on Parallel and Distributed Systems
RAPL: memory power estimation and capping

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Per-call energy saving strategies in all-to-all communications

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud Computer

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although high-performance computing traditionally focuses on the efficient execution of large-scale applications, both energy and power have become critical concerns when approaching exascale. Drastic increases in the power consumption of supercomputers affect significantly their operating costs and failure rates. In modern microprocessor architectures, equipped with dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (throttling), the power consumption may be controlled in software. Additionally, network interconnect, such as Infiniband, may be exploited to maximize energy savings while the application performance loss and frequency switching overheads must be carefully balanced. This paper advocates for a runtime assessment of such overheads by means of characterizing point-to-point communications into phases followed by analyzing the time gaps between the communication calls. Certain communication and architectural parameters are taken into consideration in the three proposed frequency scaling strategies, which differ with respect to their treatment of the time gaps. The experimental results are presented for NAS parallel benchmark problems as well as for the realistic parallel electronic structure calculations performed by the widely used quantum chemistry package GAMESS. For the latter, three different process-to-core mappings were studied as to their energy savings under the proposed frequency scaling strategies and under the existing state-of-the-art techniques. Close to the maximum energy savings were obtained with a low performance loss of 2% on the given platform.