Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture

  • Authors:
  • Janani Mukundan;Saugata Ghose;Robert Karmazin;Engin Ípek;José F. Martínez

  • Affiliations:
  • Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;University of Rochester, Rochester, NY, USA;Cornell University, Ithaca, NY, USA

  • Venue:
  • Proceedings of the 26th ACM international conference on Supercomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Though the prime target of multicore architectures is parallel and multithreaded workloads (which favors maximum core count), executing sequential code fast continues to remain critical (which benefits from maximum core size). This poses a difficult design trade-off. Core Fusion is a recently-proposed reconfigurable multicore architecture that attempts to circumvent this compromise by "fusing" groups of fundamentally independent cores into larger, more aggressive processors dynamically as needed. In this way, it accommodates highly parallel, partially parallel, multiprogrammed, and sequential codes with ease. However, the sequential performance of the original fused configuration falls quite short of an area-equivalent, monolithic, out-of-order processor. This paper effectively eliminates the fusion deficit for sequential codes by attacking two major sources of inefficiency: collective commit and instruction steering. We demonstrate in detail that these modifications allow Core Fusion to essentially match the performance of an area-equivalent monolithic out-of-order processor. The implication is that the inclusion of wide-issue cores in future multicore designs may be unnecessary.