Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Distributed Reorder Buffer Schemes for Low Power
ICCD '03 Proceedings of the 21st International Conference on Computer Design
Checkpointed Early Load Retirement
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Fast thread migration via cache working set prediction
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Hi-index | 0.00 |
Modern microprocessors achieve high application performance at an acceptable level of power dissipation. Reorder buffer is used for out-of-order instructions to be committed in-order. The reorder buffer plays a key role in modern microprocessors because performance improvement techniques highly rely on aggressive speculation to feed wider issue, out-of-order, and deep pipelines. In terms of power to performance trade-off, reorder buffer is particularly important. This is because enlarging the reorder buffer size achieves high performance but naive scaling of the conventional reorder buffer architecture can severely increase the complexity and power consumption. In this paper, we propose low-power reorder buffer techniques for contemporary microprocessors. First, the separated reorder buffer reduces power dissipation by deferred allocation and early release. The deferred allocation delays the SROB allocation of instructions until all their data dependencies are resolved. Then, the instructions are executed in program order and they are released faster from the SROB. The result of the instruction is written into rename buffers immediately after the execution completes. Then, the result values in the rename buffer are written into the architectural register file at the commit state. The proposed approaches in this paper provide higher resource utilization and low power consumption.