Theory of linear and integer programming
Theory of linear and integer programming
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
IEEE Transactions on Software Engineering
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ICS '96 Proceedings of the 10th international conference on Supercomputing
Improving dynamic voltage scaling algorithms with PACE
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Dynamic voltage scaling and power management for portable systems
Proceedings of the 38th annual Design Automation Conference
Energy-conscious compilation based on voltage scaling
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Loop Transformations for Restructuring Compilers: The Foundations
Loop Transformations for Restructuring Compilers: The Foundations
An efficient algorithm for the run-time parallelization of DOACROSS loops
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Application Transformations for Energy and Performance-Aware Device Management
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Policies for dynamic clock scheduling
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Accelerating and Adapting Precomputation Threads for Effcient Prefetching
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
The design and implementation of the DVS based dynamic compiler for power reduction
APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
Interval-based models for run-time DVFS orchestration in superscalar processors
Proceedings of the 7th ACM international conference on Computing frontiers
Loop transformations: convexity, pruning and optimization
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Inter-core prefetching for multicore processors using migrating helper threads
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
An inspector-executor algorithm for irregular assignment parallelization
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Green governors: A framework for Continuously Adaptive DVFS
IGCC '11 Proceedings of the 2011 International Green Computing Conference and Workshops
Towards more efficient execution: a decoupled access-execute approach
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Traditional compiler approaches to optimize power efficiency aim to adjust voltage and frequency at runtime to match the code characteristics to the hardware (e.g., running memory-bound phases at a lower frequency). However, such approaches are constrained by three factors: (i) voltage-frequency transitions are too slow to be applied at instruction granularity, (ii) larger code regions are seldom unequivocally memory- or compute-bound, and, (iii) the available voltage scaling range for future technologies is rapidly shrinking. These factors necessitate new approaches to address power-efficiency at the code-generation level. This paper proposes one such approach to automatically generate power-efficient code using a decoupled access/execute (DAE) model. In DAE a program is split into tasks, where each task consists of two sufficiently coarse-grained phases to enable effective Dynamic Voltage Frequency Scaling (DVFS): (i) the access-phase for data prefetch (heavily memory-bound), and (ii) the execute-phase that performs the actual computation (heavily compute-bound). Our contribution is to provide a compiler methodology to automatically generate the access-phases for a task-based programming system. Our approach is capable of handling both affine (through a polyhedral analysis) and non-affine codes (through optimized task skeletons). Our evaluation shows that the automatically generated versions improve EDP by 25% on average compared to a coupled execution, without any performance degradation, and surpasses the EDP savings of the corresponding hand-crafted tasks by 5%.