The energy efficiency of CMP vs. SMT for multimedia workloads

Authors:
Ruchira Sasanka;Sarita V. Adve;Yen-Kuang Chen;Eric Debes
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;Intel Corporation;Intel Corporation
Venue:
Proceedings of the 18th annual international conference on Supercomputing
Year:
2004

Citing 13
Cited 29

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Variability in the execution of multimedia applications and implications for architecture

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Joint local and global hardware adaptations for energy

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Single-Chip Multiprocessor

Computer
RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors

Computer
Area and System Clock Effects on SMT/CMP Processors

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Soft Real- Time Scheduling on Simultaneous Multithreaded Processors

RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Dynamically Tuning Processor Resources with Adaptive Processing

Computer

Optimizing Array-Intensive Applications for On-Chip Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Exploiting Barriers to Optimize Power Consumption of CMPs

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Locality-conscious workload assignment for array-based computations in MPSOC architectures

Proceedings of the 42nd annual Design Automation Conference
Understanding the energy efficiency of SMT and CMP with multiclustering

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Power-performance considerations of parallel computing on chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-aware computation duplication for improving reliability in embedded chip multiprocessors

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Automatic run-time extraction of communication graphs from multithreaded applications

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
High-level power analysis for multi-core chips

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Real-time rendering systems in 2010

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
ALP: Efficient support for all levels of parallelism for complex media applications

ACM Transactions on Architecture and Code Optimization (TACO)
Using fine grain multithreading for energy efficient computing

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS

ACM SIGARCH Computer Architecture News
Hardware scheduling support in SMP architectures

Proceedings of the conference on Design, automation and test in Europe
A memory-conscious code parallelization scheme

Proceedings of the 44th annual Design Automation Conference
Thermal-aware scheduling for future chip multiprocessors

EURASIP Journal on Embedded Systems
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies

IEEE Transactions on Computers
Addressing thermal nonuniformity in SMT workloads

ACM Transactions on Architecture and Code Optimization (TACO)
Power Consumption of GPUs from a Software Perspective

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
A Multi-Shared Register File Structure for VLIW Processors

Journal of Signal Processing Systems
A parallel infrastructure on dynamic EPIC SMT

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Power and thermal characterization of POWER6 system

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Looking back on the language and hardware revolutions: measured power, performance, and scaling

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
The design space of CMP vs. SMT for high performance embedded processor

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Looking back and looking forward: power, performance, and upheaval

Communications of the ACM
Register file management and compiler optimization on EDSMT

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
A Parallel infrastructure on dynamic EPIC SMT and its speculation optimization

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Performance/reliability trade-off in superscalar processors for aggressive NBTI restoration of functional units

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper compares the energy efficiency of chip multiprocessing (CMP) and simultaneous multithreading (SMT) on modern out-of-order processors for the increasingly important multimedia applications. Since performance is an important metric for real-time multimedia applications, we compare configurations at equal performance. We perform this comparison for a large number of performance points derived using different processor architectures and frequencies/voltages.We find that for the design space explored, for each workload, at each performance point, CMP is more energy efficient than SMT. The difference is small for two thread systems, but large (18% to 44%) for four thread systems. We also find that the best SMT and the best CMP configuration for a given performance target have different architecture and frequency/voltage. Therefore, their relative energy efficiency depends on a subtle interplay between various factors such as capacitance, voltage, IPC, frequency, and the level of clock gating, as well as workload features. We perform a detailed analysis considering these factors and develop a mathematical model to explain these results.Although CMP shows a clear energy advantage for four-thread (and higher) workloads, it comes at the cost of increased silicon area. We therefore investigate a hybrid solution where a CMP is built out of SMT cores, and find it to be an effective compromise. Finally, we find that we can reduce energy further for CMP with a straightforward application of previously proposed techniques of adaptive architectures and dynamic voltage/frequency scaling.