Genetic programming: on the programming of computers by means of natural selection
Genetic programming: on the programming of computers by means of natural selection
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning
International Journal of Parallel Programming
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Cost-Effective Clustered Architecture
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems
Genetic Programming IV: Routine Human-Competitive Machine Intelligence
Genetic Programming IV: Routine Human-Competitive Machine Intelligence
Predictions of CMOS compatible on-chip optical interconnect
Proceedings of the 2005 international workshop on System level interconnect prediction
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
A Criticality Analysis of Clustering in Superscalar Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Composable Lightweight Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Federation: repurposing scalar cores for out-of-order instruction issue
Proceedings of the 45th annual Design Automation Conference
Roadmap for 22nm and beyond (Invited Paper)
Microelectronic Engineering
Hi-index | 0.00 |
Though the prime target of multicore architectures is parallel and multithreaded workloads (which favors maximum core count), executing sequential code fast continues to remain critical (which benefits from maximum core size). This poses a difficult design trade-off. Core Fusion is a recently-proposed reconfigurable multicore architecture that attempts to circumvent this compromise by "fusing" groups of fundamentally independent cores into larger, more aggressive processors dynamically as needed. In this way, it accommodates highly parallel, partially parallel, multiprogrammed, and sequential codes with ease. However, the sequential performance of the original fused configuration falls quite short of an area-equivalent, monolithic, out-of-order processor. This paper effectively eliminates the fusion deficit for sequential codes by attacking two major sources of inefficiency: collective commit and instruction steering. We demonstrate in detail that these modifications allow Core Fusion to essentially match the performance of an area-equivalent monolithic out-of-order processor. The implication is that the inclusion of wide-issue cores in future multicore designs may be unnecessary.