The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Spider: A High-Speed Network Interconnect
IEEE Micro
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Leakage power modeling and optimization in interconnection networks
Proceedings of the 2003 international symposium on Low power electronics and design
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Run-time power gating of on-chip routers using look-ahead routing
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
A variable-pipeline on-chip router optimized to traffic pattern
Proceedings of the Third International Workshop on Network on Chip Architectures
FlexiBuffer: reducing leakage power in on-chip network routers
Proceedings of the 48th Design Automation Conference
NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Enabling power efficiency through dynamic rerouting on-chip
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Hi-index | 0.00 |
This paper proposes an ultra fine-grained run-time power gating of on-chip router, in which power supply to each router component (e.g., VC queue, crossbar MUX, and output latch) can be individually controlled in response to the applied workload.As only the router components which are just transferring a packet are activated, the leakage power of the on-chip network can be reduced to the near-optimal level.However, a certain amount of wakeup latency is required to activate the sleeping components, and the application performance will be degraded.In this paper, we estimate the wakeup latency for each component based on circuit simulations using a 65nm process.Then we propose four early wakeup methods to overcome the wakeup latency.The proposed router with the early wakeup methods is evaluated in terms of the application performance, area, and leakage power.As a result, it reduces the leakage power by 78.9%, at the expense of the 4.3% area and 4.0% performance when we assume a 1GHz operation.