Dynamically Controlled Resource Allocation in SMT Processors

Authors:
Francisco J. Cazorla;Alex Ramirez;Mateo Valero;Enrique Fernandez
Affiliations:
Universitat Politècnica de Catalunya, Spain;Universitat Politècnica de Catalunya, Spain;Universitat Politècnica de Catalunya, Spain;Universidad de Las Palmas de Gran Canaria, Spain
Venue:
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2004

Citing 12
Cited 44

An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Performance Study of a Multithreaded Superscalar Microprocessor

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Multithreaded Vector Architectures

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Impact of Resource Partitioning on SMT Processors

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques

Thread coloring: a scheduler proposal from user to hardware threads

ACM SIGOPS Operating Systems Review
Architectural support for real-time task scheduling in SMT processors

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Proceedings of the 33rd annual international symposium on Computer Architecture
Adaptive reorder buffers for SMT processors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Predictable Performance in SMT Processors: Synergy between the OS and SMTs

IEEE Transactions on Computers
Exploiting Operand Availability for Efficient Simultaneous Multithreading

IEEE Transactions on Computers
Fairness and Throughput in Switch on Event Multithreading

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
Fairness enforcement in switch on event multithreading

ACM Transactions on Architecture and Code Optimization (TACO)
Software-Controlled Priority Characterization of POWER5 Processor

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
An adaptive resource partitioning algorithm for SMT processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Meeting points: using thread criticality to adapt multicore hardware to parallel regions

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing register pressure in SMT processors through L2-miss-driven early register release

ACM Transactions on Architecture and Code Optimization (TACO)
Hill-climbing SMT processor resource distribution

ACM Transactions on Computer Systems (TOCS)
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Per-thread cycle accounting in SMT processors

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)
A swarm-inspired resource distribution for SMT processors

Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems
The impact of speculative execution on SMT processors

International Journal of Parallel Programming
Service level agreement for multithreaded processors

ACM Transactions on Architecture and Code Optimization (TACO)
Improving SMT performance: an application of genetic algorithms to configure resizable caches

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
MPTLsim: a cycle-accurate, full-system simulator for x86-64 multicore architectures with coherent caches

ACM SIGARCH Computer Architecture News
The Impact of Resource Sharing Control on the Design of Multicore Processors

ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
MPTLsim: a simulator for X86 multicore processors

Proceedings of the 46th Annual Design Automation Conference
Probabilistic job symbiosis modeling for SMT processor scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dynamic capacity-speed tradeoffs in SMT processor caches

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
MLP-aware dynamic cache partitioning

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Global management of cache hierarchies

Proceedings of the 7th ACM international conference on Computing frontiers
A predictable simultaneous multithreading scheme for hard real-time

ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Thread-management techniques to maximize efficiency in multicore and simultaneous multithreaded microprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Compatible phase co-scheduling on a CMP of multi-threaded processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic cache partitioning based on the MLP of cache misses

Transactions on high-performance embedded architectures and compilers III
Predictive coordination of multiple on-chip resources for chip multiprocessors

Proceedings of the international conference on Supercomputing
Managing SMT resource usage through speculative instruction window weighting

ACM Transactions on Architecture and Code Optimization (TACO)
A phase adaptive cache hierarchy for SMT processors

Microprocessors & Microsystems
A fetch policy maximizing throughput and fairness for two-context SMT processors

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Enhancing DCache warn fetch policy for SMT processors

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Probabilistic modeling for job symbiosis scheduling on SMT processors

ACM Transactions on Architecture and Code Optimization (TACO)
Scheduling optimization in multicore multithreaded microprocessors through dynamic modeling

Proceedings of the ACM International Conference on Computing Frontiers
L1-bandwidth aware thread allocation in multicore SMT processors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Recalling instructions from idling threads to maximize resource utilization for simultaneous multi-threading processors

Computers and Electrical Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

SMT processors increase performance by executing instructions from several threads simultaneously. These threads use the resources of the processor better by sharing them but, at the same time, threads are competing for these resources. The way critical resources are distributed among threads determines the final performance. Currently, processor resources are distributed among threads as determined by the fetch policy that decides which threads enter the processor to compete for resources. However, current fetch policies only use indirect indicators of resource usage in their decision, which can lead to resource monopolization by a single thread or to resource waste when no thread can use them. Both situations can harm performance and happen, for example, after an L2 cache miss. In this paper, we introduce the concept of dynamic resource control in SMT processors. Using this concept, we propose a novel resource allocation policy for SMT processors. This policy directly monitors the usage of resources by each thread and guarantees that all threads get their fair share of the critical shared resources, avoiding monopolization. We also define a mechanism to allow a thread to borrow resources from another thread if that thread does not require them, thereby reducing resource under-use. Simulation results show that our dynamic resource allocation policy outperforms a static resource allocation policy by 8%, on average. It also improves the best dynamic resource-conscious fetch policies like FLUSH++ by 4%, on average, using the harmonic mean as a metric. This indicates that our policy does not obtain the ILP boost by unfairly running high ILP threads over slow memory-bounded threads. Instead, it achieves a better throughput-fairness balance.