Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Custom-fit processors: letting applications define architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory data organization for improved cache performance in embedded processor applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Instruction buffering to reduce power in processors for signal processing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
System-level power optimization: techniques and tools
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Modulo scheduling for a fully-distributed clustered VLIW architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Storage Management Programmable Process
Storage Management Programmable Process
Design Challenges for New Application-Specific Processors
IEEE Design & Test
Instruction Scheduling for Clustered VLIW DSPs
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Effective Hardware-Based Two-Way Loop Cache for High Performance Low Power Processors
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Application-specific clustered VLIW datapaths: early exploration on a parameterized design space
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache
Proceedings of the international symposium on Code generation and optimization
DRIM: a low power dynamically reconfigurable instruction memory hierarchy for embedded systems
Proceedings of the conference on Design, automation and test in Europe
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient Method to Generate an Energy Efficient Schedule Using Operation Shuffling
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Playing the trade-off game: Architecture exploration using Coffeee
ACM Transactions on Design Automation of Electronic Systems (TODAES)
COFFEE: compiler framework for energy-aware exploration
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Enabling large decoded instruction loop caching for energy-aware embedded processors
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Embedded Systems Design
DLIC: Decoded loop instructions caching for energy-aware embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. In particular, software controlled clustered loop buffers are energy efficient. However current compilers for VLIW do not fully exploit the potentials offered by such a clustered organization This paper presents an algorithm to explore what is the optimal loop buffer configuration and the optimal way to use this configuration for an application or a set of applications. Results for the MediaBench application suite show an additional 18% reduction (on average) in energy in the instruction memory hierarchy as compared to traditional non-clustered approaches to the loop buffer without compromising performance.