Dynamic voltage and frequency scaling based on workload decomposition
Proceedings of the 2004 international symposium on Low power electronics and design
Feedback-Based Dynamic Voltage and Frequency Scaling for Memory-Bound Real-Time Applications
RTAS '05 Proceedings of the 11th IEEE Real Time on Embedded Technology and Applications Symposium
Improvement of Power-Performance Efficiency for High-End Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Using multiple energy gears in MPI programs on a power-scalable cluster
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The HPC Challenge (HPCC) benchmark suite
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Operating System Modifications for Task-Based Speed and Voltage
Proceedings of the 1st international conference on Mobile systems, applications and services
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Just-in-time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs
Journal of Parallel and Distributed Computing
Auto-tuning for energy usage in scientific applications
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Enabling fair pricing on HPC systems with node sharing
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A generic high-performance method for deinterleaving scientific data
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
Over the life of a modern supercomputer, the energy cost of running the system can exceed the cost of the original hardware purchase. This has driven the community to attempt to understand and minimize energy costs wherever possible. Towards these ends, we present an automated, fine-grained approach to selecting per-loop processor clock frequencies. The clock frequency selection criteria is established through a combination of lightweight static analysis and runtime tracing that automatically acquires application signatures - characterizations of the patterns of execution of each loop in an application. This application characterization is matched with one of a series of benchmark loops, which have been run on the target system and probe it in various ways. These benchmarks form a covering set, a machine characterization of the expected power consumption and performance traits of the machine over the space of execution patterns and clock frequencies. The frequency that confers the optimal behavior in terms of power-delay product for the benchmark that most closely resembles each application loop is the one chosen for that loop. The set of tools that implement this scheme is fully automated, built on top of freely available open source software, and uses an inexpensive power measurement apparatus. We use these tools to show a measured, system-wide energy savings of up to 7.6% on an 8-core Intel Xeon E5530 and 10.6% on a 32-core AMD Opteron 8380 (a Sun X4600 Node) across a range of workloads.