Back-end assignment schemes for clustered multithreaded processors
Proceedings of the 18th annual international conference on Supercomputing
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Evaluating the impact of simultaneous multithreading on network servers using real hardware
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Learning-Based SMT Processor Resource Distribution via Hill-Climbing
Proceedings of the 33rd annual international symposium on Computer Architecture
Adaptive reorder buffers for SMT processors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fairness enforcement in switch on event multithreading
ACM Transactions on Architecture and Code Optimization (TACO)
Addressing thermal nonuniformity in SMT workloads
ACM Transactions on Architecture and Code Optimization (TACO)
An adaptive resource partitioning algorithm for SMT processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
DLL-conscious instruction fetch optimization for SMT processors
Journal of Systems Architecture: the EUROMICRO Journal
Hill-climbing SMT processor resource distribution
ACM Transactions on Computer Systems (TOCS)
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Per-thread cycle accounting in SMT processors
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
ACM Transactions on Architecture and Code Optimization (TACO)
A swarm-inspired resource distribution for SMT processors
Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems
The impact of speculative execution on SMT processors
International Journal of Parallel Programming
Service level agreement for multithreaded processors
ACM Transactions on Architecture and Code Optimization (TACO)
The Impact of Resource Sharing Control on the Design of Multicore Processors
ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Probabilistic job symbiosis modeling for SMT processor scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A predictable simultaneous multithreading scheme for hard real-time
ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Compatible phase co-scheduling on a CMP of multi-threaded processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A phase adaptive cache hierarchy for SMT processors
Microprocessors & Microsystems
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
An in-order SMT architecture with static resource partitioning for consumer applications
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
How to enhance a superscalar processor to provide hard real-time capable in-order SMT
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Reliability-aware core partitioning in chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Probabilistic modeling for job symbiosis scheduling on SMT processors
ACM Transactions on Architecture and Code Optimization (TACO)
Fair CPU time accounting in CMP+SMT processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processors
Computers and Electrical Engineering
FROCM: a fair and low-overhead method in SMT processor
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Simultaneous multithreading (SMT) increases processor throughput by multiplexing resources among several threads. Despite the commercial availability of SMT processors, several aspects of this resource sharing are not well understood. For example, academic SMT studies typically assume that resources are shared dynamically, while industrial designs tend to divide resources statically among threads.This study seeks to quantify the performance impact of resource partitioning policies in SMT machines, focusing on the execution portion of the pipeline. We find that for storageresources, such as the instruction queue and reorder buffer, statically allocating an equal portion to each thread provides good performance, in part by avoiding starvation. The enforced fairness provided by this partitioning obviates sophisticated fetch policies to a large extent. SMT's potential ability to allocate storage resources dynamically across threads doesnot appear to be of significant benefit.In contrast, static division of issue bandwidth has a negative impact on throughput. SMT's ability to multiplex bursty execution streams dynamically onto shared function units contributesto its overall throughput.Finally, we apply these insights to SMT support in clustered architectures. Assigning threads to separate clusters eliminates inter-cluster communication; however, in some circumstances, the resulting partitioning of issue bandwidth cancels out the performance benefit of eliminating communication.