ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Variability in the execution of multimedia applications and implications for architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Joint local and global hardware adaptations for energy
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Computer
Area and System Clock Effects on SMT/CMP Processors
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Soft Real- Time Scheduling on Simultaneous Multithreaded Processors
RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Optimizing Array-Intensive Applications for On-Chip Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Locality-conscious workload assignment for array-based computations in MPSOC architectures
Proceedings of the 42nd annual Design Automation Conference
Understanding the energy efficiency of SMT and CMP with multiclustering
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Energy-aware computation duplication for improving reliability in embedded chip multiprocessors
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Automatic run-time extraction of communication graphs from multithreaded applications
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
High-level power analysis for multi-core chips
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Real-time rendering systems in 2010
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
ALP: Efficient support for all levels of parallelism for complex media applications
ACM Transactions on Architecture and Code Optimization (TACO)
Using fine grain multithreading for energy efficient computing
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS
ACM SIGARCH Computer Architecture News
Hardware scheduling support in SMP architectures
Proceedings of the conference on Design, automation and test in Europe
A memory-conscious code parallelization scheme
Proceedings of the 44th annual Design Automation Conference
Thermal-aware scheduling for future chip multiprocessors
EURASIP Journal on Embedded Systems
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies
IEEE Transactions on Computers
Addressing thermal nonuniformity in SMT workloads
ACM Transactions on Architecture and Code Optimization (TACO)
Power Consumption of GPUs from a Software Perspective
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
A Multi-Shared Register File Structure for VLIW Processors
Journal of Signal Processing Systems
A parallel infrastructure on dynamic EPIC SMT
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Power and thermal characterization of POWER6 system
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Looking back on the language and hardware revolutions: measured power, performance, and scaling
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
The design space of CMP vs. SMT for high performance embedded processor
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Looking back and looking forward: power, performance, and upheaval
Communications of the ACM
Register file management and compiler optimization on EDSMT
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
A Parallel infrastructure on dynamic EPIC SMT and its speculation optimization
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Hi-index | 0.02 |
This paper compares the energy efficiency of chip multiprocessing (CMP) and simultaneous multithreading (SMT) on modern out-of-order processors for the increasingly important multimedia applications. Since performance is an important metric for real-time multimedia applications, we compare configurations at equal performance. We perform this comparison for a large number of performance points derived using different processor architectures and frequencies/voltages.We find that for the design space explored, for each workload, at each performance point, CMP is more energy efficient than SMT. The difference is small for two thread systems, but large (18% to 44%) for four thread systems. We also find that the best SMT and the best CMP configuration for a given performance target have different architecture and frequency/voltage. Therefore, their relative energy efficiency depends on a subtle interplay between various factors such as capacitance, voltage, IPC, frequency, and the level of clock gating, as well as workload features. We perform a detailed analysis considering these factors and develop a mathematical model to explain these results.Although CMP shows a clear energy advantage for four-thread (and higher) workloads, it comes at the cost of increased silicon area. We therefore investigate a hybrid solution where a CMP is built out of SMT cores, and find it to be an effective compromise. Finally, we find that we can reduce energy further for CMP with a straightforward application of previously proposed techniques of adaptive architectures and dynamic voltage/frequency scaling.