Reducing instruction fetch energy with backwards branch control information and buffering

Authors:
Jude A. Rivers;Sameh Asaad;John-David Wellman;Jaime H. Moreno
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
Proceedings of the 2003 international symposium on Low power electronics and design
Year:
2003

Citing 4
Cited 3

The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch energy reduction using loop caches for embedded applications with small tight loops

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
An innovative low-power high-performance programmable signal processor for digital communications

IBM Journal of Research and Development
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example

IEEE Computer Architecture Letters

Power-efficient instruction delivery through trace reuse

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
LPA: a first approach to the loop processor architecture

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Adaptive loop caching using lightweight runtime control flow analysis

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many emerging applications, e.g. in the embedded and DSP space, are often characterized by their loopy nature where a substantial part of the execution time is spent within a few program phases. Loop buffering techniques have been proposed for capturing and processing these loops in small buffers to reduce the processor`s instruction fetch energy. However, these schemes are limited to straight-line or innermost loops and fail to adequately handle complex loops.In this paper, we propose a dynamic loop buffering mechanism that uses backwards branch control information to identify, capture and process complex loop structures. The DLB controller has been fully implemented in VHDL, synthesized and timed with the IBM Booledozer and Einstimer Synthesis tools, and analyzed for power with the Sequence PowerTheater tool. Our experiments show that the DLB approach, on average, results in a factor of 3 reduction in energy consumption compared to a traditional instruction memory design at an area overhead of about 9%.