Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Effective cluster assignment for modulo scheduling
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A static power model for architects
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Power-aware modulo scheduling for high-performance VLIW processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Low swing dual threshold voltage domino logic
Proceedings of the 12th ACM Great Lakes symposium on VLSI
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Exploiting VLIW schedule slacks for dynamic and leakage energy reduction
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cluster assignment for high-performance embedded VLIW processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Power-Driven Challenges in Nanometer Design
IEEE Design & Test
Design Challenges of Technology Scaling
IEEE Micro
Optimizing Static Power Dissipation by Functional Units in Superscalar Processors
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Managing static leakage energy in microprocessor functional units
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Adapting instruction level parallelism for optimizing leakage in VLIW architectures
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Region-based hierarchical operation partitioning for multicluster processors
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Efficient Backtracking Instruction Schedulers
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Instruction Scheduling for Clustered VLIW DSPs
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
CARS: A New Code Generation Framework for Clustered ILP Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Integrated temporal and spatial scheduling for extended operand clustered VLIW processors
Proceedings of the 1st conference on Computing frontiers
A Graph Matching Based Integrated Scheduling Framework for Clustered VLIW Processors
ICPPW '04 Proceedings of the 2004 International Conference on Parallel Processing Workshops
Transition aware scheduling: increasing continuous idle-periods in resource units
Proceedings of the 2nd conference on Computing frontiers
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processors
Journal of Systems and Software
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.