Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Stage-skip pipeline: a low power processor architecture using a decoded instruction buffer
ISLPED '96 Proceedings of the 1996 international symposium on Low power electronics and design
Proceedings of the 24th annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction buffering to reduce power in processors for signal processing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Effective Hardware-Based Two-Way Loop Cache for High Performance Low Power Processors
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Power Savings in Embedded Processors through Decode Filer Cache
Proceedings of the conference on Design, automation and test in Europe
Power Issues Related to Branch Prediction
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example
IEEE Computer Architecture Letters
EURASIP Journal on Applied Signal Processing
Low power microarchitecture with instruction reuse
Proceedings of the 5th conference on Computing frontiers
Reusing cached schedules in an out-of-order processor with in-order issue logic
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Hi-index | 0.00 |
In this paper, we propose a new issue queue design that is capable of scheduling reusable instructions. Once the issue queue is reusing instructions, no instruction cache access is needed since the instructions are supplied by the issue queue itself. Furthermore, dynamic branch prediction and instruction decoding can also be avoided permitting the gating of the front-end stages of the pipeline (the stages before register renaming). Results using array-intensive codes show that up to 82% of the total execution cycles, the pipeline front-end can be gated, providing a power reduction of 72% in the instruction cache, 33% in the branch predictor, and 21% in the issue queue, respectively, at a small performance cost. Our analysis of compiler optimizations indicates that the power savings can be further improved by using optimized code.