The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Design Challenges of Technology Scaling
IEEE Micro
Thermal Management System for High Performance PowerPCTM Microprocessors
COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
A Thermal-Aware Superscalar Microprocessor
ISQED '02 Proceedings of the 3rd International Symposium on Quality Electronic Design
Reducing power density through activity migration
Proceedings of the 2003 international symposium on Low power electronics and design
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Temperature-aware microarchitecture: Modeling and implementation
ACM Transactions on Architecture and Code Optimization (TACO)
Design and implementation of the POWER5™ microprocessor
Proceedings of the 41st annual Design Automation Conference
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with Execution Migration
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures
IEEE Computer Architecture Letters
POWER4 system microarchitecture
IBM Journal of Research and Development
CMOS design near the limit of scaling
IBM Journal of Research and Development
Power-constrained CMOS scaling limits
IBM Journal of Research and Development
A study of thread migration in temperature-constrained multicores
ACM Transactions on Architecture and Code Optimization (TACO)
The shared-thread multiprocessor
Proceedings of the 22nd annual international conference on Supercomputing
Analytical results for design space exploration of multi-core processors employing thread migration
Proceedings of the 13th international symposium on Low power electronics and design
Performance Implications of Cache Affinity on Multicore Processors
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
MultiLayer processing - an execution model for parallel stateful packet processing
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Evaluating multi-core platforms for HPC data-intensive kernels
Proceedings of the 6th ACM conference on Computing frontiers
Adapting application execution in CMPs using helper threads
Journal of Parallel and Distributed Computing
Dynamic thermal management using thin-film thermoelectric cooling
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Throughput optimal task allocation under thermal constraints for multi-core processors
Proceedings of the 46th Annual Design Automation Conference
A cross-layer approach to heterogeneity and reliability
MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
Proceedings of the 7th ACM international conference on Computing frontiers
Proceedings of the 37th annual international symposium on Computer architecture
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
A moving threads processor architecture MTPA
The Journal of Supercomputing
Eliminating energy of same-content-cell-columns of on-chip SRAM arrays
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Thread Tranquilizer: Dynamically reducing performance variation
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Exploring concurrency using the parallel analysis tool
Proceedings of the 43rd ACM technical symposium on Computer Science Education
ACM Transactions on Computer Systems (TOCS)
Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
High performance multi-core processors are becoming an industry reality. Although multi-cores are suited for multithreaded and multi-programmed workloads, many applications are still mono-thread and multi-core performance with a single thread workload is an important issue. Furthermore, recent studies suggest that performance, power and temperature considerations of future multi-cores may necessitate activity-migration between cores.Motivated by the above, this paper investigates the performance implications of single thread migration on a multi-core. Specifically, the study considers the influence on the performance of a single thread of the following migration and multi-core parameters: frequency of migration, core warm-up modes, subset of resources that are warmed-up, number of cores, and cache hierarchy organization. The results of this study can provide insight to architects on how to design performance-efficient power and thermal strategies for a multi-core chip.The experimental results, for the benchmarks and microarchitectures used in this study, show that the performance loss due to activity migration on a multi-core with private L1s and a shared L2 can be minimized if: (a) a migrating thread continues its execution on a core that was previously visited by the thread, and (b) cores remember their predictor state since their previous activation (all other core resources can be cold). The analogous conclusions for a multi-core with private L1s and L2s and a shared L3 are: remembering the predictor state, maintaining the tags of the various L2 caches coherent and allowing L2-L2 data transfers from inactive cores to the active core.The data also show that when migration period is at least every 160K cycles, the transfer of register state between two cores and the flushing of dirty private L1 data have a negligible performance overhead.