Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture

Authors:
Janani Mukundan;Saugata Ghose;Robert Karmazin;Engin Ípek;José F. Martínez
Affiliations:
Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;University of Rochester, Rochester, NY, USA;Cornell University, Ithaca, NY, USA
Venue:
Proceedings of the 26th ACM international conference on Supercomputing
Year:
2012

Citing 22
Cited 0

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

International Journal of Parallel Programming
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Inherently Lower-Power High-Performance Superscalar Architectures

IEEE Transactions on Computers
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
A Cost-Effective Clustered Architecture

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture

ICCD '98 Proceedings of the International Conference on Computer Design
Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems

Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems
Genetic Programming IV: Routine Human-Competitive Machine Intelligence

Genetic Programming IV: Routine Human-Competitive Machine Intelligence
Predictions of CMOS compatible on-chip optical interconnect

Proceedings of the 2005 international workshop on System level interconnect prediction
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Composable Lightweight Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Federation: repurposing scalar cores for out-of-order instruction issue

Proceedings of the 45th annual Design Automation Conference
Roadmap for 22nm and beyond (Invited Paper)

Microelectronic Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Though the prime target of multicore architectures is parallel and multithreaded workloads (which favors maximum core count), executing sequential code fast continues to remain critical (which benefits from maximum core size). This poses a difficult design trade-off. Core Fusion is a recently-proposed reconfigurable multicore architecture that attempts to circumvent this compromise by "fusing" groups of fundamentally independent cores into larger, more aggressive processors dynamically as needed. In this way, it accommodates highly parallel, partially parallel, multiprogrammed, and sequential codes with ease. However, the sequential performance of the original fused configuration falls quite short of an area-equivalent, monolithic, out-of-order processor. This paper effectively eliminates the fusion deficit for sequential codes by attacking two major sources of inefficiency: collective commit and instruction steering. We demonstrate in detail that these modifications allow Core Fusion to essentially match the performance of an area-equivalent monolithic out-of-order processor. The implication is that the inclusion of wide-issue cores in future multicore designs may be unnecessary.