Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Effective stream-based and execution-based data prefetching
Proceedings of the 18th annual international conference on Supercomputing
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor
IEEE Computer Architecture Letters
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Kilo-instruction processors, runahead and prefetching
Proceedings of the 3rd conference on Computing frontiers
Improving single-thread performance with fine-grain state maintenance
Proceedings of the 5th conference on Computing frontiers
Off-loading application controlled data prefetching in numerical codes for multi-core processors
International Journal of Computational Science and Engineering
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Anti-caching: a new approach to database management system architecture
Proceedings of the VLDB Endowment
Tuning the continual flow pipeline architecture with virtual register renaming
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Runahead execution is a technique that improves processor performance by pre-executing the running application instead of stalling the processor when a long-latency cache miss occurs. Previous research has shown that this technique significantly improves processor performance. However, the efficiency of runahead execution, which directly affects the dynamic energy consumed by a runahead processor, has not been explored. A runahead processor executes significantly more instructions than a traditionalout-of-order processor, sometimes without providing any performance benefit, which makes it inefficient. In this paper, we describe the causes of inefficiency in runahead execution and propose techniques to make a runahead processor more efficient, thereby reducing its energy consumption and possibly increasing its performance. Our analyses and results provide two major insights: (1) the efficiency of runahead execution can be greatly improved with simple techniques that reduce the number of short, overlapping, and useless runahead periods, which we identify as the three major causes of inefficiency, (2) simple optimizations targeting the increase of useful prefetches generated in runahead mode can increase both the performance and efficiency of a runahead processor. The techniques we propose reduce the increase in the number of instructions executed due to runahead execution from 26.5% to 6.2%, on average, without significantly affecting the performance improvement provided by runahead execution.