Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Characterization of E-Commerce Traffic
Electronic Commerce Research
Leakage power modeling and optimization in interconnection networks
Proceedings of the 2003 international symposium on Low power electronics and design
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
State-Preserving vs. Non-State-Preserving Leakage Control in Caches
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
High-level power analysis for on-chip networks
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Voltage-frequency island partitioning for GALS-based networks-on-chip
Proceedings of the 44th annual Design Automation Conference
Thermal-aware scheduling for future chip multiprocessors
EURASIP Journal on Embedded Systems
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
A high level power model for Network-on-Chip (NoC) router
Computers and Electrical Engineering
Hi-index | 0.00 |
When processing multi-threaded workloads requiring significant inter-thread communication, opportunities to reduce power consumption arise due to the large latencies in obtaining data from the threads running on remote cores and the lack of architectural resources implemented in the simple cores to cover these latencies. In this work we propose to use the drowsy mode technique to save leakage power on the cores and leverage the mesh-based communication fabric to hide the wake-up latency of the core blocks. We have observed a potential for reducing the overall power of around 70% in a generic homogeneous 256-core tile-based multi-core architecture.