Runahead Execution: An Effective Alternative to Large Instruction Windows

Authors:
Onur Mutlu;Jared Stark;Chris Wilkerson;Yale N. Patt
Affiliations:
The University of Texas at Austin;Intel Microarchitecture Research Lab;Intel Microarchitecture Research Lab;The University of Texas at Austin
Venue:
IEEE Micro
Year:
2003

Citing 0
Cited 15

Cache Refill/Access Decoupling for Vector Machines

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Toward kilo-instruction processors

ACM Transactions on Architecture and Code Optimization (TACO)
Temporal Streaming of Shared Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Techniques for Efficient Processing in Runahead Execution Engines

Proceedings of the 32nd annual international symposium on Computer Architecture
Store-Ordered Streaming of Shared Memory

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance

IEEE Micro
Spatial Memory Streaming

Proceedings of the 33rd annual international symposium on Computer Architecture
Improving single-thread performance with fine-grain state maintenance

Proceedings of the 5th conference on Computing frontiers
Temporal instruction fetch streaming

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Memory management thread for heap allocation intensive sequential applications

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Forwardflow: a scalable core for power-constrained CMPs

Proceedings of the 37th annual international symposium on Computer architecture
Scalable memory registration for high performance networks using helper threads

Proceedings of the 8th ACM International Conference on Computing Frontiers
Proactive instruction fetch

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
SHIFT: shared history instruction fetch for lean-core server processors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

An instruction window that can tolerate latencies to DRAM memory is prohibitively complex and power hungry. To avoid having to build such large windows, runahead execution uses otherwise-idle clock cycles to achieve an average 22 percent performance improvement for processors with instruction windows of contemporary sizes. This technique incurs only a small hardware cost and does not significantly increase the processor's complexity.