Performance implications of single thread migration on a chip multi-core

Authors:
Theofanis Constantinou;Yiannakis Sazeides;Pierre Michaud;Damien Fetis;Andre Seznec
Affiliations:
University of Cyprus, Nicosia, Cyprus;University of Cyprus, Nicosia, Cyprus;Irisa/Inria, Rennes Cedex, France;Irisa/Inria, Rennes Cedex, France;Irisa/Inria, Rennes Cedex, France
Venue:
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Year:
2005

Citing 19
Cited 20

The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
New microarchitecture challenges in the coming generations of CMOS process technologies (keynote address)(abstract only)

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Design Challenges of Technology Scaling

IEEE Micro
Thermal Management System for High Performance PowerPCTM Microprocessors

COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
A Thermal-Aware Superscalar Microprocessor

ISQED '02 Proceedings of the 3rd International Symposium on Quality Electronic Design
Reducing power density through activity migration

Proceedings of the 2003 international symposium on Low power electronics and design
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Temperature-aware microarchitecture: Modeling and implementation

ACM Transactions on Architecture and Code Optimization (TACO)
Design and implementation of the POWER5™ microprocessor

Proceedings of the 41st annual Design Automation Conference
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Montecito: A Dual-Core, Dual-Thread Itanium Processor

IEEE Micro
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with Execution Migration

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures

IEEE Computer Architecture Letters
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro
POWER4 system microarchitecture

IBM Journal of Research and Development
CMOS design near the limit of scaling

IBM Journal of Research and Development
Power-constrained CMOS scaling limits

IBM Journal of Research and Development

A study of thread migration in temperature-constrained multicores

ACM Transactions on Architecture and Code Optimization (TACO)
The shared-thread multiprocessor

Proceedings of the 22nd annual international conference on Supercomputing
Analytical results for design space exploration of multi-core processors employing thread migration

Proceedings of the 13th international symposium on Low power electronics and design
Performance Implications of Cache Affinity on Multicore Processors

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
MultiLayer processing - an execution model for parallel stateful packet processing

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Evaluating multi-core platforms for HPC data-intensive kernels

Proceedings of the 6th ACM conference on Computing frontiers
Adapting application execution in CMPs using helper threads

Journal of Parallel and Distributed Computing
Dynamic thermal management using thin-film thermoelectric cooling

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Throughput optimal task allocation under thermal constraints for multi-core processors

Proceedings of the 46th Annual Design Automation Conference
A cross-layer approach to heterogeneity and reliability

MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
Proposition for a sequential accelerator in future general-purpose manycore processors and the problem of migration-induced cache misses

Proceedings of the 7th ACM international conference on Computing frontiers
Leveraging the core-level complementary effects of PVT variations to reduce timing emergencies in multi-core processors

Proceedings of the 37th annual international symposium on Computer architecture
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Runtime temperature-based power estimation for optimizing throughput of thermal-constrained multi-core processors

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
A moving threads processor architecture MTPA

The Journal of Supercomputing
Eliminating energy of same-content-cell-columns of on-chip SRAM arrays

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Thread Tranquilizer: Dynamically reducing performance variation

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Exploring concurrency using the parallel analysis tool

Proceedings of the 43rd ACM technical symposium on Computer Science Education
Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems

ACM Transactions on Computer Systems (TOCS)
Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

High performance multi-core processors are becoming an industry reality. Although multi-cores are suited for multithreaded and multi-programmed workloads, many applications are still mono-thread and multi-core performance with a single thread workload is an important issue. Furthermore, recent studies suggest that performance, power and temperature considerations of future multi-cores may necessitate activity-migration between cores.Motivated by the above, this paper investigates the performance implications of single thread migration on a multi-core. Specifically, the study considers the influence on the performance of a single thread of the following migration and multi-core parameters: frequency of migration, core warm-up modes, subset of resources that are warmed-up, number of cores, and cache hierarchy organization. The results of this study can provide insight to architects on how to design performance-efficient power and thermal strategies for a multi-core chip.The experimental results, for the benchmarks and microarchitectures used in this study, show that the performance loss due to activity migration on a multi-core with private L1s and a shared L2 can be minimized if: (a) a migrating thread continues its execution on a core that was previously visited by the thread, and (b) cores remember their predictor state since their previous activation (all other core resources can be cold). The analogous conclusions for a multi-core with private L1s and L2s and a shared L3 are: remembering the predictor state, maintaining the tags of the various L2 caches coherent and allowing L2-L2 data transfers from inactive cores to the active core.The data also show that when migration period is at least every 160K cycles, the transfer of register state between two cores and the flushing of dirty private L1 data have a negligible performance overhead.