WiDGET: Wisconsin decoupled grid execution tiles

Authors:
Yasuko Watanabe;John D. Davis;David A. Wood
Affiliations:
University of Wisconsin-Madison, Madison, WI, USA;Microsoft Research - Silicon Valley Lab, Mountain View, CA, USA;University of Wisconsin-Madison, Madison, WI, USA
Venue:
Proceedings of the 37th annual international symposium on Computer architecture
Year:
2010

Citing 30
Cited 10

Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Register integration: a simple and efficient implementation of squash reuse

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
An instruction set and microarchitecture for instruction level distributed processing

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Simics: A Full System Simulation Platform

Computer
The Stanford Hydra CMP

IEEE Micro
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
A Cost-Effective Clustered Architecture

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Dynamic Cluster Resizing

ICCD '03 Proceedings of the 21st International Conference on Computer Design
Optimum Power/Performance Pipeline Depth

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Dynamically Tuning Processor Resources with Adaptive Processing

Computer
Theoretical and practical limits of dynamic voltage scaling

Proceedings of the 41st annual Design Automation Conference
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Microarchitectural techniques for power gating of execution units

Proceedings of the 2004 international symposium on Low power electronics and design
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Energy-Efficient Thread-Level Speculation

IEEE Micro
NoSQ: Store-Load Communication without a Store Queue

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Composable Lightweight Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
The Case for Energy-Proportional Computing

Computer
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
System power management support in the IBM POWER6 microprocessor

IBM Journal of Research and Development
Achieving Out-of-Order Performance with Almost In-Order Complexity

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Dynamic Frequency and Voltage Scaling for a Multiple-Clock-Domain Microprocessor

IEEE Micro
Amdahl's Law in the Multicore Era

Computer
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
PowerNap: eliminating server idle power

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Thread motion: fine-grained power management for multi-core systems

Proceedings of the 36th annual international symposium on Computer architecture

Amdahl's law for predicting the future of multicores considered harmful

ACM SIGARCH Computer Architecture News
Distributed replay protocol for distributed uniprocessors

Proceedings of the 26th ACM international conference on Supercomputing
CRQ-based fair scheduling on composable multicore architectures

Proceedings of the 26th ACM international conference on Supercomputing
Disjoint out-of-order execution processor

ACM Transactions on Architecture and Code Optimization (TACO)
Improving performance per watt of asymmetric multi-core processors via online program phase classification and adaptive core morphing

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Predictability for timing and temperature in multiprocessor system-on-chip platforms

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Flicker: a dynamically adaptive architecture for power limited multicore systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
The sharing architecture: sub-core configurability for IaaS clouds

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent paradigm shift to multi-core systems results in high system throughput within a specified power budget. However, future systems still require good single thread performance--no longer the predominant design priority--to mitigate sequential bottlenecks and/or to guarantee service-level agreements. Unfortunately, near saturation in voltage scaling necessitates a long-term alternative to dynamic voltage and frequency scaling. We propose an energy-proportional computing infrastructure, called WiDGET, that decouples thread context management from a sea of simple execution units (EUs). WiDGET's decoupled design provides flexibility to alter resource allocation for a particular power-performance target while turning off unallocated resources. In other words, WiDGET enables dynamic customization of different combinations of small and/or powerful cores on a single chip, consuming power in proportion to the delivered performance. Over all SPEC CPU2006 benchmarks, WiDGET provides average per-thread performance that is 26% better than a Xeon-like processor while using 8% less power. WiDGET can also scale down to a level comparable to an Atom-like processor, turning off resources to reduce average power by 58%. WiDGET achieves high power efficiency (BIPS3/W), exceeding Xeon-like and Atom-like processors by up to 2x and 21x, respectively.