Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Decoupled sectored caches: conciliating low tag implementation cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Micro-operation cache: a power aware frontend for the variable instruction length ISA
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Filtering Techniques to Improve Trace-Cache Efficiency
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power Savings in Embedded Processors through Decode Filer Cache
Proceedings of the conference on Design, automation and test in Europe
Design of a Predictive Filter Cache for Energy Savings in High Performance Processor Architectures
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Reducing leakage in power-saving capable caches for embedded systems by using a filter cache
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
DLIC: Decoded loop instructions caching for energy-aware embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
The power consumption of microprocessors has been increasing in step with the complexity of each progressive generation. In general purpose processors, this is primarily attributed to the high energy consumption of fetch and decode circuitry, pursuant to the high instruction issue rate required of these high performance processors. Predictive Decode Filter Cache (DFC) has been shown to be effective in reducing the fetch and decode energy consumed by the instruction cache hierarchy of in order single issue processors. In this paper we propose the architectural level enhancements to facilitate the incorporation of the DFC in wide issue superscalar processors for an energy efficient memory hierarchy. Extensive simulations on the modified superscalar architecture shows that the use of the (predictor based) DFC results in an average reduction of 17.33% and 25.09% fetch energy reduction in L1 cache along with 37.2% and 46.6% reduction in number of decodes for 64 and 128 instruction DFC respectively. This fetch and decode energy savings are achieved with minimal reduction in the average Instruction Per Cycle (IPC) of 0.54% and 0.73% for 64 and 128 instruction DFC for the selected set of spec2000 benchmarks.