Predictable performance in SMT processors

Authors:
Francisco J. Cazorla;Peter M.W. Knijnenburg;Rizos Sakellariou;Enrique Fernández;Alex Ramirez;Mateo Valero
Affiliations:
DAC, UPC, Spain;Leiden University, The Netherlands;University of Manchester, United Kingdom;University de Las Palmas de GC, Spain;DAC, UPC, Spain;DAC, UPC, Spain
Venue:
Proceedings of the 1st conference on Computing frontiers
Year:
2004

Citing 15
Cited 14

An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
SMT Layout Overhead and Scalability

IEEE Transactions on Parallel and Distributed Systems
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Performance Study of a Multithreaded Superscalar Microprocessor

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Soft Real- Time Scheduling on Simultaneous Multithreaded Processors

RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
Dynamic Scheduling Issues in SMT Architectures

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture

QoS for High-Performance SMT Processors in Embedded Systems

IEEE Micro
Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Architectural support for real-time task scheduling in SMT processors

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
A dynamically reconfigurable cache for multithreaded processors

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Energy-Efficient Simultaneous Thread Fetch from Different Cache Levels in a Soft Real-Time SMT Processor

SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
IPC Control for Multiple Real-Time Threads on an In-Order SMT Processor

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Dynamic MIPS rate stabilization in out-of-order processors

Proceedings of the 36th annual international symposium on Computer architecture
A predictable simultaneous multithreading scheme for hard real-time

ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Vision for liquid architecture

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Predictive coordination of multiple on-chip resources for chip multiprocessors

Proceedings of the international conference on Supercomputing
How to enhance a superscalar processor to provide hard real-time capable in-order SMT

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Thread priority-aware random replacement in TLBs for a high-performance real-time SMT processor

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current instruction fetch policies in SMT processors are oriented towards optimization of overall throughput and/or fairness. However, they provide no control over how individual threads are executed, leading to performance unpredictability, since the IPC of a thread depends on the workload it is executed in and on the fetch policy used.From the point of view of the Operating System (OS), it is the job scheduler that determines how jobs are executed. However, when the OS runs on an SMT processor, the job scheduler cannot guarantee execution time constraints of any job due to this performance unpredictability.In this paper we propose a novel kind of collaboration between the OS and the SMT hardware that enables the OS to enforce that a high priority thread runs at a specific fraction of its full speed. We present an extensive evaluation using many different workloads, that shows that this mechanism gives the required performance in more than 97% of all cases considered, and even more than 99% for the less extreme cases. At the same time, our mechanism does not need to trade off predictability against overall throughput, as it maximizes the IPC of the remaining low priority threads, giving 94% on average (and 97.5% on average for the less extreme cases) of the throughput obtained using instruction fetch policies oriented toward throughput maximization, such as icount.