MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

Authors:
Kenzo Craeynest;Stijn Eyerman;Lieven Eeckhout
Affiliations:
Department of Electronics and Information Systems (ELIS), Ghent University, Belgium;Department of Electronics and Information Systems (ELIS), Ghent University, Belgium;Department of Electronics and Information Systems (ELIS), Ghent University, Belgium
Venue:
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Year:
2008

Citing 17
Cited 1

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Alpha 21264 Microprocessor Architecture

ICCD '98 Proceedings of the International Conference on Computer Design
Picking Statistically Valid and Early Simulation Points

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
The Impact of Resource Partitioning on SMT Processors

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Techniques for Efficient Processing in Runahead Execution Engines

Proceedings of the 32nd annual international symposium on Computer Architecture
A Memory-Level Parallelism Aware Fetch Policy for SMT Processors

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Optimising long-latency-load-aware fetch policies for SMT processors

International Journal of High Performance Computing and Networking
System-Level Performance Metrics for Multiprogram Workloads

IEEE Micro

Efficient runahead threads

Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Threads experiencing long-latency loads on a simultaneous multith- reading (SMT) processor may clog shared processor resources without making forward progress, thereby starving other threads and reducing overall system throughput. An elegant solution to the long-latency load problem in SMT processors is to employ runahead execution. Runahead threads do not block commit on a long-latency load but instead execute subsequent instructions in a speculative execution mode to expose memory-level parallelism (MLP) through prefetching. The key benefit of runahead SMT threads is twofold: (i) runahead threads do not clog resources on a long-latency load, and (ii) runahead threads exploit far-distance MLP. This paper proposes MLP-aware runahead threads: runahead execution is only initiated in case there is far-distance MLP to be exploited. By doing so, useless runahead executions are eliminated, thereby reducing the number of speculatively executed instructions (and thus energy consumption) while preserving the performance of the runahead thread and potentially improving the performance of the co-executing thread(s). Our experimental results show that MLP-aware runahead threads reduce the number of speculatively executed instructions by 13.9% and 10.1% for two-program and four-program workloads, respectively, compared to MLP-agnostic runahead threads while achieving comparable system throughput and job turnaround time.