Checkpoint repair for out-of-order execution machines
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Tuning the Pentium Pro Microarchitecture
IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Toward kilo-instruction processors
ACM Transactions on Architecture and Code Optimization (TACO)
Scalable Load and Store Processing in Latency Tolerant Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Out-of-Order Commit Processors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Proceedings of the 36th annual international symposium on Computer architecture
An efficient algorithm for exploiting multiple arithmetic units
IBM Journal of Research and Development
Simultaneous continual flow pipeline architecture
ICCD '11 Proceedings of the 2011 IEEE 29th International Conference on Computer Design
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Tuning the continual flow pipeline architecture with virtual register renaming
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Continual Flow Pipelines (CFP) allow a processor core to process instruction windows of hundreds of in-flight instructions while keeping its cycle critical scheduler and register file small. CFP defers the execution of instructions that depend on cache misses, moving these instructions to a single ported SRAM buffer outside the pipeline, and continues execution of instructions independent of the cache miss until the miss data is fetched into the cache. This way, the CFP processor core avoids pipeline execution stalls due to cache misses while keeping multi-ported power hungry structures in the pipeline small. However, CFP cores have to wake up and replay the deferred miss dependent instructions through the execution pipeline, again, when the miss data arrives, causing these instructions to increase the circuit activity of the execution pipeline and consequently core energy consumption. In this paper, we present and evaluate virtual register renaming as a substrate for CFP cores that significantly shortens the replay loop and reduces the circuit activity of deferred miss dependent instructions, thus increasing the energy efficiency of Continual Flow Pipelines architectures.