Efficient runahead threads

Authors:
Tanausú Ramirez;Alex Pajuelo;Oliverio Jesus Santana;Onur Mutlu;Mateo Valero
Affiliations:
Universitat Politecnica de Catalunya (UPC), Barcelona, Spain;Universitat Politecnica de Catalunya (UPC), Barcelona, Spain;Universidad de Las Palmas de Gran Canaria (ULPGC), Las Palmas de Gran Canaria, Spain;Carnegie Mellon University (CMU), Pittsburgh, PA, USA;Universitat Politecnica de Catalunya (UPC), Barcelona, Spain
Venue:
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Year:
2010

Citing 18
Cited 0

Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Memory-system design considerations for dynamically-scheduled processors

Proceedings of the 24th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors

IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
Mitigating Amdahl's Law through EPI Throttling

Proceedings of the 32nd annual international symposium on Computer Architecture
Techniques for Efficient Processing in Runahead Execution Engines

Proceedings of the 32nd annual international symposium on Computer Architecture
High-Performance Throughput Computing

IEEE Micro
A Memory-Level Parallelism Aware Fetch Policy for SMT Processors

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Runahead Threads (RaT) is a promising solution that enables a thread to speculatively run ahead and prefetch data instead of stalling for a long-latency load in a simultaneous multithreading processor. With this capability, RaT can reduces resource monopolization due to memory-intensive threads and exploits memory-level parallelism, improving both system performance and single-thread performance. Unfortunately, the benefits of RaT come at the expense of increasing the number of executed instructions, which adversely affects its energy efficiency. In this paper, we propose Runahead Distance Prediction (RDP), a simple technique to improve the efficiency of Runahead Threads. The main idea of the RDP mechanism is to predict how far a thread should run ahead speculatively such that speculative execution is useful. By limiting the runahead distance of a thread, we generate efficient runahead threads that avoid unnecessary speculative execution and enhance RaT energy efficiency. By reducing runahead-based speculation when it is predicted to be not useful, RDP also allows shared resources to be efficiently used by non-speculative threads. Our results show that RDP significantly reduces power consumption while maintaining the performance of RaT, providing better performance and energy balance than previous proposals in the field.