The NAS parallel benchmarks—summary and preliminary results
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
General atomic and molecular electronic structure system
Journal of Computational Chemistry
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
From trace generation to visualization: a performance framework for distributed parallel systems
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatic Profiling of MPI Applications with Hardware Performance Counters
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Enabling the Efficient Use of SMP Clusters: The GAMESS/DDI Model
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scheduling Processor Voltage and Frequency in Server and Cluster Systems
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Using multiple energy gears in MPI programs on a power-scalable cluster
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Mitigating Amdahl's Law through EPI Throttling
Proceedings of the 32nd annual international symposium on Computer Architecture
A Power-Aware Run-Time System for High-Performance Computing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters
ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Bounding energy consumption in large-scale MPI programs
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Prediction models for multi-dimensional power-performance optimization on many cores
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Evaluating high performance communication: a power perspective
Proceedings of the 23rd international conference on Supercomputing
Adagio: making DVS practical for complex HPC applications
Proceedings of the 23rd international conference on Supercomputing
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Energy Profiling and Analysis of the HPC Challenge Benchmarks
International Journal of High Performance Computing Applications
Energy-Efficient Cluster Computing via Accurate Workload Characterization
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications
IEEE Transactions on Parallel and Distributed Systems
RAPL: memory power estimation and capping
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Per-call energy saving strategies in all-to-all communications
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud Computer
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
Although high-performance computing traditionally focuses on the efficient execution of large-scale applications, both energy and power have become critical concerns when approaching exascale. Drastic increases in the power consumption of supercomputers affect significantly their operating costs and failure rates. In modern microprocessor architectures, equipped with dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (throttling), the power consumption may be controlled in software. Additionally, network interconnect, such as Infiniband, may be exploited to maximize energy savings while the application performance loss and frequency switching overheads must be carefully balanced. This paper advocates for a runtime assessment of such overheads by means of characterizing point-to-point communications into phases followed by analyzing the time gaps between the communication calls. Certain communication and architectural parameters are taken into consideration in the three proposed frequency scaling strategies, which differ with respect to their treatment of the time gaps. The experimental results are presented for NAS parallel benchmark problems as well as for the realistic parallel electronic structure calculations performed by the widely used quantum chemistry package GAMESS. For the latter, three different process-to-core mappings were studied as to their energy savings under the proposed frequency scaling strategies and under the existing state-of-the-art techniques. Close to the maximum energy savings were obtained with a low performance loss of 2% on the given platform.