The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Proceedings of the 14th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Energy reduction in queues and stacks by adaptive bitwidth compression
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Efficient dynamic scheduling through tag elimination
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A high-speed dynamic instruction scheduling scheme for superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Energy-efficient hybrid wakeup logic
Proceedings of the 2002 international symposium on Low power electronics and design
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
An Adaptive Issue Queue for Reduced Power at High Performance
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Instruction issue logic for pipelined supercomputers
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
CMP off-chip bandwidth scheduling guided by instruction criticality
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Instruction queues consume a significant amount of power in a high-performance processor. The wakeup logic delay is also a critical timing parameter. This paper compares a commonly used CAM-based instruction queue organization with a new pointer-based design for delay and energy efficiency. A design and pre-layout of all critical structures in 70nm technology is performed for both organizations. The pointerbased design is shown to use 10 to 15 times less power than the CAM-based design, depending on queue size, for a 4-wide issue, 5GHz processor. The results also demonstrate the importance of evaluating all steps of instruction queue access: allocation, issue and wakeup rather than wakeup alone, especially for power consumption.