An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ACM Computing Surveys (CSUR)
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors
IEEE Transactions on Computers
Torus with Slotted Rings Architecture for a Cache-Coherent Multiprocessor
Proceedings of the 1994 International Conference on Parallel and Distributed Systems
A Progressive Approach to Handling Message-Dependent Deadlock in Parallel Computer Systems
IEEE Transactions on Parallel and Distributed Systems
A Performance Comparison of Hierarchical Ring- and Mesh- Connected Multiprocessor Networks
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
A low latency router supporting adaptivity for on-chip interconnects
Proceedings of the 42nd annual Design Automation Conference
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
The design and implementation of a low-latency on-chip network
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
Proceedings of the 33rd annual international symposium on Computer Architecture
A Statistical Traffic Model for On-Chip Interconnection Networks
MASCOTS '06 Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
Approaching Ideal NoC Latency with Pre-Configured Routes
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
A Hybrid Ring/Mesh Interconnect for Network-on-Chip Using Hierarchical Rings for Global Routing
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Implications of Rent's Rule for NoC Design and Its Fault-Tolerance
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Reducing Packet Dropping in a Bufferless NoC
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Adapting the Hyper-Ring Interconnect for Many-Core Processors
ISPA '08 Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
A case for bufferless routing in on-chip networks
Proceedings of the 36th annual international symposium on Computer architecture
SCARAB: a single cycle adaptive routing and bufferless network
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Low-cost router microarchitecture for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Asynchronous Bypass Channels: Improving Performance for Multi-synchronous NoCs
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A low-latency adaptive asynchronous interconnection network using bi-modal router nodes
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
CHIPPER: A low-complexity bufferless deflection router
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Benchmarking modern multiprocessors
Benchmarking modern multiprocessors
Proceedings of the 9th conference on Computing Frontiers
Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI
Proceedings of the 49th Annual Design Automation Conference
A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
NOCS '12 Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip
MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect
NOCS '12 Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip
Déjà Vu Switching for Multiplane NoCs
NOCS '12 Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip
Design of an Energy-Efficient Asynchronous NoC and Its Optimization Tools for Heterogeneous SoCs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
WaveSync: A low-latency source synchronous bypass network-on-chip architecture
ICCD '12 Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
Hi-index | 0.00 |
The rapid emergence of Chip Multi-Processors (CMP) as the de facto microprocessor archetype has highlighted the importance of scalable and efficient on-chip networks. Packet-based Networks-on-Chip (NoC) are gradually cementing themselves as the medium of choice for the multi-/many-core systems of the near future, due to their innate scalability. However, the prominence of the debilitating power wall requires the NoC to also be as energy efficient as possible. To achieve these two antipodal requirements—scalability and energy efficiency—we propose TornadoNoC, an interconnect architecture that employs a novel flow control mechanism. To prevent livelocks and deadlocks, a sequence numbering scheme and a dynamic ring inflation technique are proposed, and their correctness formally proven. The primary objective of TornadoNoC is to achieve substantial gains in (a) scalability to many-core systems and (b) the area/power footprint, as compared to current state-of-the-art router implementations. The new router is demonstrated to provide better scalability to hundreds of cores than an ideal single-cycle wormhole implementation and other scalability-enhanced low-cost routers. Extensive simulations using both synthetic traffic patterns and real applications running in a full-system simulator corroborate the efficacy of the proposed design. Finally, hardware synthesis analysis using commercial 65nm standard-cell libraries indicates that the area and power budgets of the new router are reduced by up to 53% and 58%, respectively, as compared to existing state-of-the-art low-cost routers.