Forwardflow: a scalable core for power-constrained CMPs

Authors:
Dan Gibson;David A. Wood
Affiliations:
University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA
Venue:
Proceedings of the 37th annual international symposium on Computer architecture
Year:
2010

Citing 27
Cited 8

Instruction issue logic for high-performance, interruptable pipelined processors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Executing a Program on the MIT Tagged-Token Dataflow Architecture

IEEE Transactions on Computers
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Energy-efficient hybrid wakeup logic

Proceedings of the 2002 international symposium on Low power electronics and design
Simics: A Full System Simulation Platform

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Half-price architecture

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Dynamically Tuning Processor Resources with Adaptive Processing

Computer
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Direct Instruction Wakeup for Out-of-Order Processors

IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A scalable low power issue queue for large instruction window processors

Proceedings of the 20th annual international conference on Supercomputing
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
NoSQ: Store-Load Communication without a Store Queue

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Matrix scheduler reloaded

Proceedings of the 34th annual international symposium on Computer architecture
Composable Lightweight Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Runahead Execution: An Effective Alternative to Large Instruction Windows

IEEE Micro
Amdahl's Law in the Multicore Era

Computer

A fault-tolerant, dynamically scheduled pipeline structure for chip multiprocessors

SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
Global register alias table: Boosting sequential program on multi-core

Future Generation Computer Systems
Amdahl's law for predicting the future of multicores considered harmful

ACM SIGARCH Computer Architecture News
Disjoint out-of-order execution processor

ACM Transactions on Architecture and Code Optimization (TACO)
Improving performance per watt of asymmetric multi-core processors via online program phase classification and adaptive core morphing

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Flicker: a dynamically adaptive architecture for power limited multicore systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
A hyperscalar dual-core architecture for embedded systems

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip Multiprocessors (CMPs) are now commodity hardware, but commoditization of parallel software remains elusive. In the near term, the current trend of increased core-per-socket count will continue, despite a lack of parallel software to exercise the hardware. Future CMPs must deliver thread-level parallelism when software provides threads to run, but must also continue to deliver performance gains for single threads by exploiting instruction-level parallelism and memory-level parallelism. However, power limitations will prevent conventional cores from exploiting both simultaneously. This work presents the Forwardflow Architecture, which can scale its execution logic up to run single threads, or down to run multiple threads in a CMP. Forwardflow dynamically builds an explicit internal dataflow representation from a conventional instruction set architecture, using forward dependence pointers to guide instruction wakeup, selection, and issue. Forwardflow's backend is organized into discrete units that can be individually (de-)activated, allowing each core's performance to be scaled by system software at the architectural level. On single threads, Forwardflow core scaling yields a mean runtime reduction of 21% for a 37% increase in power consumption. For multithreaded workloads, a Forwardflow-based CMP allows system software to select the performance point that best matches available power.