Area and System Clock Effects on SMT/CMP Throughput

Authors:
James Burns;Jean-Luc Gaudiot
Affiliations:
-;IEEE
Venue:
IEEE Transactions on Computers
Year:
2005

Citing 20
Cited 3

Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
SMT Layout Overhead and Scalability

IEEE Transactions on Parallel and Distributed Systems
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
Area and System Clock Effects on SMT/CMP Processors

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Reducing register ports for higher speed and lower energy

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Power-Sensitive Multithreaded Architecture

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Banked multiported register files for high-frequency superscalar microprocessors

Proceedings of the 30th annual international symposium on Computer architecture
A Clustered Approach to Multithreaded Processors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Macro-op Scheduling: Relaxing Scheduling Loop Constraints

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro

Dependability, power, and performance trade-off on a multicore processor

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Fixed-priority scheduling on prioritized SMT processor

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Wimpy or brawny cores: A throughput perspective

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	14.98

Visualization

Abstract

Two approaches to high throughput processors are Chip Multi-Processing (CMP) and Simultaneous Multi-Threading (SMT). CMP increases layout efficiency, which allows more functional units and a faster clock rate. However, CMP suffers from hardware partitioning of functional resources. SMT increases functional unit utilization by issuing instructions simultaneously from multiple threads. However, a wide-issue SMT suffers from layout and technology implementation problems. We use silicon resources as our basis for comparison and find that area and system clock have a large effect on the optimal SMT/CMP design trade. We show the area overhead of SMT on each processor and how it scales with the width of the processor pipeline and the number of SMT threads. The wide issue SMT delivers the highest single-thread performance with improved multithread throughput. However, multiple smaller cores deliver the highest throughput. Also, alternate processor configurations are explored that trade off SMT threads for other microarchitecture features. The result is a small increase to single-thread performance, but a fairly large reduction in throughput.