Performance-asymmetry-aware scheduling for Chip Multiprocessors with static core coupling

  • Authors:
  • Jianbo Dong;Lei Zhang;Yinhe Han;Guihai Yan;Xiaowei Li

  • Affiliations:
  • Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, PR China and Graduate University of Chinese Academy of Sciences, Beijin ...;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, PR China;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, PR China and Graduate University of Chinese Academy of Sciences, Beijin ...;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, PR China and Graduate University of Chinese Academy of Sciences, Beijin ...;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, PR China and Graduate University of Chinese Academy of Sciences, Beijin ...

  • Venue:
  • Journal of Systems Architecture: the EUROMICRO Journal
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Thread-level redundancy is an efficient approach for transient fault detection and recovery in Chip Multiprocessors (CMPs), in which two adjacent cores are statically coupled to form a functional Dual Modular Redundancy (DMR). Manufacturing process variations cause core-to-core (C2C) performance asymmetry across the chip, which can be further divided into the asymmetry among core-pairs and the asymmetry within a core-pair. We call them inter- and intra-pair asymmetries, respectively, both of which should be taken into considerations in application scheduling for CMPs with static core coupling. In this paper, we first formulate the above scheduling problem as a 0-1 programming problem to maximize the system Weighted Throughput. An efficient IVF&AppSen algorithm is then proposed, which we prove to be optimal when the number of applications equals to that of core-pairs. We also adapt the Simulated Annealing technique to tackle this problem when applications are less than core-pairs on chip. Simulations on a 64-core CMP shows that the proposed algorithms achieve 2.5-9.3% improvement in Weighted Throughput when compared to prior VarF&AppIPC algorithm.