Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
IEEE Micro
An integrated GPU power and performance model
Proceedings of the 37th annual international symposium on Computer architecture
Power-Efficient Work Distribution Method for CPU-GPU Heterogeneous System
ISPA '10 Proceedings of the International Symposium on Parallel and Distributed Processing with Applications
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
ACM SIGARCH Computer Architecture News
Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
IEEE Micro
Cooperative boosting: needy versus greedy power management
Proceedings of the 40th Annual International Symposium on Computer Architecture
Coordinated energy management in heterogeneous processors
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Design and implementation of the fusion simulator based on multi-shader GPU
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Managing mobile platform power
Proceedings of the International Conference on Computer-Aided Design
Power Modeling for Heterogeneous Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
With technology scaling, manufacturers are integrating both CPU and GPU cores in a single chip to improve the throughput of emerging applications. To maximize the throughput of a single-chip heterogeneous processor (SCHP), the chip power budget shared between the CPU and GPU must be effectively utilized. At the same time, the CPU and GPU in an SCHP must each satisfy its own power constraint. Furthermore, the power budget allocated to the CPU and GPU impacts performance. In this paper, using a detailed cycle-level SCHP simulator, we first demonstrate that the joint optimization of workload and power budget partitioning between the CPU and GPU can provide 13% higher throughput than the optimization of workload partitioning alone under a fixed power budget allocation to the CPU and GPU. Second, we propose an effective runtime algorithm that can determine near-optimal or optimal combinations of workload and power budget partitioning. The algorithm exploits the runtime power efficiencies of the workload executed on the CPU and the GPU. Using the detailed cycle-level SCHP simulator, we show that within five to eight kernel invocations the algorithm can achieve 96% of the maximum throughput obtained by an exhaustive search algorithm. Finally, we demonstrate comparable throughput improvements when we apply the algorithm to a commercial computing system with an SCHP.