Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Interconnect estimation and planning for deep submicron designs
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
Dynamic code partitioning for clustered architectures
International Journal of Parallel Programming - parallel architectures and compilation techniques, part II
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Sub-90nm technologies: challenges and opportunities for CAD
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Partitioned first-level cache design for clustered microarchitectures
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
The energy efficiency of CMP vs. SMT for multimedia workloads
Proceedings of the 18th annual international conference on Supercomputing
Back-end assignment schemes for clustered multithreaded processors
Proceedings of the 18th annual international conference on Supercomputing
Understanding the energy efficiency of simultaneous multithreading
Proceedings of the 2004 international symposium on Low power electronics and design
Performance, Energy, and Thermal Considerations for SMT and CMP Architectures
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A SMT-ARM simulator and performance evaluation
SEPADS'06 Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
The sharing architecture: sub-core configurability for IaaS clouds
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
In this paper we study the energy efficiency of SMT and CMP with multiclustering. Through a detailed design space exploration, we show that clustering closes the energy efficiency gap between SMT and CMP at equal performance points. Specifically, we show that the energy efficiency of CMP compared to SMT at a given performance decreases from a maximum of 25% in a monolithic processor case to 6% when the processor resources are clustered. By carefully considering floorplans, we show that this is, in part, enabled by the small energy consumption (less than 3%) of the interconnection buses required for clustering, even with SMT. As the gap narrows, we show that the efficiency of SMT versus CMP depends on the contribution of leakage energy: at lower leakage, the CMP tends to be better than the SMT, while the SMT outperforms the CMP at higher leakage levels. We demonstrate these results over a wide range of performance and machine configurations