Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Dynamic IPC/clock rate optimization
Proceedings of the 25th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
A circuit level implementation of an adaptive issue queue for power-aware microprocessors
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Energy-efficient issue queue design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Hi-index | 0.00 |
Mainstream processors implement the instruction scheduler using a monolithic CAM-based issue queue (IQ), which consumes increasingly high energy as its size scales. In particular, its instruction wakeup logic accounts for a major portion of the consumed energy. Our study shows that instructions with 2 non-ready operands (called 2OP instructions) are in small percentage, but tend to spend long latencies in the IQ. They can be effectively shelved in a small RAM-based waiting instruction buffer (WIB) and steered into the IQ at appropriate time. With this two-level shelving ability, half of the CAM tag comparators are eliminated in the IQ, which significantly reduces the energy of wakeup operation. In addition, we propose an adaptive banking scheme to downsize the IQ and reduce the bit-width of tag comparators. Experiments indicate that for an 8-wide issue superscalar or SMT processor, the energy consumption of the instruction scheduler can be reduced by 67%. Furthermore, the new design has potentially faster scheduler clock speed while maintaining close IPC to the monolithic scheduler design. Compared with the previous work on eliminating tags through prediction, our design is superior in terms of both energy reduction and SMT support.