Instruction buffering exploration for low energy VLIWs with instruction clusters

Authors:
Tom Vander Aa;Murali Jayapala;Francisco Barat;Geert Deconinck;Rudy Lauwereins;Francky Catthoor;Henk Corporaal
Affiliations:
K.U.Leuven/ESAT, Heverlee, Arenberg, Belgium;K.U.Leuven/ESAT, Heverlee, Arenberg, Belgium;K.U.Leuven/ESAT, Heverlee, Arenberg, Belgium;K.U.Leuven/ESAT, Heverlee, Arenberg, Belgium;IMEC vzw, Heverlee, Belgium;IMEC vzw, Heverlee, Belgium;TU Eindhoven, AZ Eindhoven, Netherlands
Venue:
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Year:
2004

Citing 15
Cited 9

Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Custom-fit processors: letting applications define architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory data organization for improved cache performance in embedded processor applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Instruction buffering to reduce power in processors for signal processing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Instruction fetch energy reduction using loop caches for embedded applications with small tight loops

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
System-level power optimization: techniques and tools

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Storage Management Programmable Process

Storage Management Programmable Process
Design Challenges for New Application-Specific Processors

IEEE Design & Test
Instruction Scheduling for Clustered VLIW DSPs

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Effective Hardware-Based Two-Way Loop Cache for High Performance Low Power Processors

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Application-specific clustered VLIW datapaths: early exploration on a parameterized design space

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache

Proceedings of the international symposium on Code generation and optimization
DRIM: a low power dynamically reconfigurable instruction memory hierarchy for embedded systems

Proceedings of the conference on Design, automation and test in Europe
Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient Method to Generate an Energy Efficient Schedule Using Operation Shuffling

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Playing the trade-off game: Architecture exploration using Coffeee

ACM Transactions on Design Automation of Electronic Systems (TODAES)
COFFEE: compiler framework for energy-aware exploration

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Enabling large decoded instruction loop caching for energy-aware embedded processors

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Low power engineering

Embedded Systems Design
DLIC: Decoded loop instructions caching for energy-aware embedded processors

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. In particular, software controlled clustered loop buffers are energy efficient. However current compilers for VLIW do not fully exploit the potentials offered by such a clustered organization This paper presents an algorithm to explore what is the optimal loop buffer configuration and the optimal way to use this configuration for an application or a set of applications. Results for the MediaBench application suite show an additional 18% reduction (on average) in energy in the instruction memory hierarchy as compared to traditional non-clustered approaches to the loop buffer without compromising performance.