Techniques for Efficient Processing in Runahead Execution Engines

Authors:
Onur Mutlu;Hyesoon Kim;Yale N. Patt
Affiliations:
University of Texas at Austin;University of Texas at Austin;University of Texas at Austin
Venue:
Proceedings of the 32nd annual international symposium on Computer Architecture
Year:
2005

Citing 11
Cited 13

Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Simultaneous subordinate microthreading (SSMT)

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Effective stream-based and execution-based data prefetching

Proceedings of the 18th annual international conference on Supercomputing
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters
On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor

IEEE Computer Architecture Letters
Runahead Execution: An Effective Alternative to Large Instruction Windows

IEEE Micro

Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance

IEEE Micro
Kilo-instruction processors, runahead and prefetching

Proceedings of the 3rd conference on Computing frontiers
Improving single-thread performance with fine-grain state maintenance

Proceedings of the 5th conference on Computing frontiers
Off-loading application controlled data prefetching in numerical codes for multi-core processors

International Journal of Computational Science and Engineering
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)
Efficient runahead threads

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
CRIB: consolidated rename, issue, and bypass

Proceedings of the 38th annual international symposium on Computer architecture
MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Anti-caching: a new approach to database management system architecture

Proceedings of the VLDB Endowment
Tuning the continual flow pipeline architecture with virtual register renaming

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Runahead execution is a technique that improves processor performance by pre-executing the running application instead of stalling the processor when a long-latency cache miss occurs. Previous research has shown that this technique significantly improves processor performance. However, the efficiency of runahead execution, which directly affects the dynamic energy consumed by a runahead processor, has not been explored. A runahead processor executes significantly more instructions than a traditionalout-of-order processor, sometimes without providing any performance benefit, which makes it inefficient. In this paper, we describe the causes of inefficiency in runahead execution and propose techniques to make a runahead processor more efficient, thereby reducing its energy consumption and possibly increasing its performance. Our analyses and results provide two major insights: (1) the efficiency of runahead execution can be greatly improved with simple techniques that reduce the number of short, overlapping, and useless runahead periods, which we identify as the three major causes of inefficiency, (2) simple optimizations targeting the increase of useful prefetches generated in runahead mode can increase both the performance and efficiency of a runahead processor. The techniques we propose reduce the increase in the number of instructions executed due to runahead execution from 26.5% to 6.2%, on average, without significantly affecting the performance improvement provided by runahead execution.