A control-theoretic approach to dynamic voltage scheduling
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Formal online methods for voltage/frequency control in multiple clock domain microprocessors
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Runtime identification of microprocessor energy saving opportunities
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Bounding energy consumption in large-scale MPI programs
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Prediction models for multi-dimensional power-performance optimization on many cores
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Computer Architecture Techniques for Power-Efficiency
Computer Architecture Techniques for Power-Efficiency
Adagio: making DVS practical for complex HPC applications
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 46th Annual Design Automation Conference
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
The Scalable Heterogeneous Computing (SHOC) benchmark suite
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Proceedings of the 24th ACM International Conference on Supercomputing
An integrated GPU power and performance model
Proceedings of the 37th annual international symposium on Computer architecture
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Porting irregular reductions on heterogeneous CPU-GPU configurations
HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
A dynamic scheduling framework for emerging heterogeneous systems
HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture
HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Practical performance prediction under Dynamic Voltage Frequency Scaling
IGCC '11 Proceedings of the 2011 International Green Computing Conference and Workshops
Stargazer: Automated regression-based GPU design space exploration
ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Energy based performance tuning for large scale high performance computing systems
Proceedings of the 2012 Symposium on High Performance Computing
Workload and power budget partitioning for single-chip heterogeneous processors
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Accelerating MapReduce on a coupled CPU-GPU architecture
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Poster: An Exascale Workload Study
SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Cooperative boosting: needy versus greedy power management
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
This paper examines energy management in a heterogeneous processor consisting of an integrated CPU-GPU for high-performance computing (HPC) applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types -- a new and less understood problem. We examine the intra-node CPU-GPU frequency sensitivity of HPC applications on tightly coupled CPU-GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU-GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED^2) product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.