Memory-level parallelism aware fetch policies for simultaneous multithreading processors

Authors:
Stijn Eyerman;Lieven Eeckhout
Affiliations:
Ghent University, Gent, Belgium;Ghent University, Gent, Belgium
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2009

Citing 32
Cited 2

The effectiveness of multiple hardware contexts

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Code transformations to improve memory parallelism

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Predictor-directed stream buffers

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
Enhancing memory level parallelism via recovery-free value prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Alpha 21264 Microprocessor Architecture

ICCD '98 Proceedings of the International Conference on Computer Design
Picking Statistically Valid and Early Simulation Points

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Initial Observations of the Simultaneous Multithreading Pentium 4 Processor

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
The Impact of Resource Partitioning on SMT Processors

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Multithreaded Value Prediction

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Toward kilo-instruction processors

ACM Transactions on Architecture and Code Optimization (TACO)
Techniques for Efficient Processing in Runahead Execution Engines

Proceedings of the 32nd annual international symposium on Computer Architecture
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Proceedings of the 33rd annual international symposium on Computer Architecture
A Memory-Level Parallelism Aware Fetch Policy for SMT Processors

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Optimising long-latency-load-aware fetch policies for SMT processors

International Journal of High Performance Computing and Networking
System-Level Performance Metrics for Multiprogram Workloads

IEEE Micro

Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processors

Computers and Electrical Engineering
Recalling instructions from idling threads to maximize resource utilization for simultaneous multi-threading processors

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A thread executing on a simultaneous multithreading (SMT) processor that experiences a long-latency load will eventually stall while holding execution resources. Existing long-latency load aware SMT fetch policies limit the amount of resources allocated by a stalled thread by identifying long-latency loads and preventing the thread from fetching more instructions—and in some implementations, instructions beyond the long-latency load are flushed to release allocated resources. This article proposes an SMT fetch policy that takes into account the available memory-level parallelism (MLP) in a thread. The key idea proposed in this article is that in case of an isolated long-latency load (i.e., there is no MLP), the thread should be prevented from allocating additional resources. However, in case multiple independent long-latency loads overlap (i.e., there is MLP), the thread should allocate as many resources as needed in order to fully expose the available MLP. MLP-aware fetch policies achieve better performance for MLP-intensive threads on SMT processors, leading to higher overall system throughput and shorter average turnaround time than previously proposed fetch policies.