Performance implications of single thread migration on a chip multi-core

  • Authors:
  • Theofanis Constantinou;Yiannakis Sazeides;Pierre Michaud;Damien Fetis;Andre Seznec

  • Affiliations:
  • University of Cyprus, Nicosia, Cyprus;University of Cyprus, Nicosia, Cyprus;Irisa/Inria, Rennes Cedex, France;Irisa/Inria, Rennes Cedex, France;Irisa/Inria, Rennes Cedex, France

  • Venue:
  • ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

High performance multi-core processors are becoming an industry reality. Although multi-cores are suited for multithreaded and multi-programmed workloads, many applications are still mono-thread and multi-core performance with a single thread workload is an important issue. Furthermore, recent studies suggest that performance, power and temperature considerations of future multi-cores may necessitate activity-migration between cores.Motivated by the above, this paper investigates the performance implications of single thread migration on a multi-core. Specifically, the study considers the influence on the performance of a single thread of the following migration and multi-core parameters: frequency of migration, core warm-up modes, subset of resources that are warmed-up, number of cores, and cache hierarchy organization. The results of this study can provide insight to architects on how to design performance-efficient power and thermal strategies for a multi-core chip.The experimental results, for the benchmarks and microarchitectures used in this study, show that the performance loss due to activity migration on a multi-core with private L1s and a shared L2 can be minimized if: (a) a migrating thread continues its execution on a core that was previously visited by the thread, and (b) cores remember their predictor state since their previous activation (all other core resources can be cold). The analogous conclusions for a multi-core with private L1s and L2s and a shared L3 are: remembering the predictor state, maintaining the tags of the various L2 caches coherent and allowing L2-L2 data transfers from inactive cores to the active core.The data also show that when migration period is at least every 160K cycles, the transfer of register state between two cores and the flushing of dirty private L1 data have a negligible performance overhead.