Instruction issue logic for high-performance, interruptable pipelined processors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Executing a Program on the MIT Tagged-Token Dataflow Architecture
IEEE Transactions on Computers
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Energy-efficient hybrid wakeup logic
Proceedings of the 2002 international symposium on Low power electronics and design
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Direct Instruction Wakeup for Out-of-Order Processors
IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A scalable low power issue queue for large instruction window processors
Proceedings of the 20th annual international conference on Supercomputing
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
NoSQ: Store-Load Communication without a Store Queue
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 34th annual international symposium on Computer architecture
Composable Lightweight Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Amdahl's Law in the Multicore Era
Computer
A fault-tolerant, dynamically scheduled pipeline structure for chip multiprocessors
SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
Global register alias table: Boosting sequential program on multi-core
Future Generation Computer Systems
Amdahl's law for predicting the future of multicores considered harmful
ACM SIGARCH Computer Architecture News
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Flicker: a dynamically adaptive architecture for power limited multicore systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
A hyperscalar dual-core architecture for embedded systems
Microprocessors & Microsystems
Hi-index | 0.00 |
Chip Multiprocessors (CMPs) are now commodity hardware, but commoditization of parallel software remains elusive. In the near term, the current trend of increased core-per-socket count will continue, despite a lack of parallel software to exercise the hardware. Future CMPs must deliver thread-level parallelism when software provides threads to run, but must also continue to deliver performance gains for single threads by exploiting instruction-level parallelism and memory-level parallelism. However, power limitations will prevent conventional cores from exploiting both simultaneously. This work presents the Forwardflow Architecture, which can scale its execution logic up to run single threads, or down to run multiple threads in a CMP. Forwardflow dynamically builds an explicit internal dataflow representation from a conventional instruction set architecture, using forward dependence pointers to guide instruction wakeup, selection, and issue. Forwardflow's backend is organized into discrete units that can be individually (de-)activated, allowing each core's performance to be scaled by system software at the architectural level. On single threads, Forwardflow core scaling yields a mean runtime reduction of 21% for a 37% increase in power consumption. For multithreaded workloads, a Forwardflow-based CMP allows system software to select the performance point that best matches available power.