SMT Layout Overhead and Scalability

Authors:
James Burns;Jean-Luc Gaudiot
Affiliations:
Pennsylvania State Univ., University Park;Univ. of Kentucky
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2002

Citing 16
Cited 15

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The HP PA-8000 RISC CPU

IEEE Micro
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
Area and System Clock Effects on SMT/CMP Processors

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Register File Design Considerations in Dynamically Scheduled Processors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Fine-Grain Multithreading Superscalar Architecture

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
A Scalable Register File Architecture for Dynamically Scheduled Processors

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Branch Prediction and Simultaneous Multithreading

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Predictable performance in SMT processors

Proceedings of the 1st conference on Computing frontiers
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Understanding the energy efficiency of simultaneous multithreading

Proceedings of the 2004 international symposium on Low power electronics and design
Area and System Clock Effects on SMT/CMP Throughput

IEEE Transactions on Computers
A Master-Slave Adaptive Load-Distribution Processor Model on PCA

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Speculative pre-execution assisted by compiler (SPEAR)

Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Design and evaluation of a hierarchical decoupled architecture

The Journal of Supercomputing
Future ILP processors

International Journal of High Performance Computing and Networking
Optimising long-latency-load-aware fetch policies for SMT processors

International Journal of High Performance Computing and Networking
Transparent reconfigurable acceleration for heterogeneous embedded applications

Proceedings of the conference on Design, automation and test in Europe
A phase adaptive cache hierarchy for SMT processors

Microprocessors & Microsystems
Static partitioning vs dynamic sharing of resources in simultaneous multithreading microarchitectures

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Low-latency adaptive mode transitions and hierarchical power management in asymmetric clustered cores

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Simultaneous Multi-Threading (SMT) is a hardware technique that increases processor throughput by issuing instructions simultaneously from multiple threads. However, while SMT can be added to an existing microarchitecture with relatively low overhead, this additional chip area could be used for other resources such as more functional units, larger caches, or better branch predictors. How large is the SMT overhead and at what point does SMT no longer pay off for maximum throughput compared to adding other architecture features? This paper evaluates the silicon overhead of SMT by performing a transistor/interconnect-level analysis of the layout. We discuss microarchitecture issues that impact SMT implementations and show how the Instruction Set Architecture (ISA) and microarchitecture can have a large effect on the SMT overhead and performance. Results show that SMT yields large performance gains with small to moderate area overhead.