SMT Layout Overhead and Scalability
IEEE Transactions on Parallel and Distributed Systems
The energy efficiency of CMP vs. SMT for multimedia workloads
Proceedings of the 18th annual international conference on Supercomputing
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Area and System Clock Effects on SMT/CMP Throughput
IEEE Transactions on Computers
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Supporting microthread scheduling and synchronisation in CMPs
International Journal of Parallel Programming
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Chip multiprocessor based on data-driven multithreading model
International Journal of High Performance Systems Architecture
Hardware budget and runtime system for data-driven multithreaded chip multiprocessor
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
The design space of CMP vs. SMT for high performance embedded processor
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Function units sharing between neighbor cores in CMP
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Abstract: Two approaches to high throughput processors are Chip Multi-Processing (CMP) and Simultaneous Multi-Threading (SMT). CMP increases layout efficiency, which allows more functional units and a faster clock rate. However, CMP suffers from hardware partitioning of functional resources. SMT increases functional unit utilization by issuing instructions simultaneously from multiple threads. However, a wide-issue SMT suffers from layout and technology implementation problems. We use silicon resources as our basis for comparison and find that area and system clock have a large effect on the optimal SMT/CMP design trade. We show the area overhead of SMT on each processor and how it scales with the width of the processor pipeline and the number of SMT threads. The wide issue SMT delivers the highest single-thread performance with improved multi-thread throughput. However multiple smaller cores deliver the highest throughput.