Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
A Network on Chip Architecture and Design Methodology
ISVLSI '02 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
SPIN: A Scalable, Packet Switched, On-Chip Micro-Network
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum - Volume 2
Application-specific buffer space allocation for networks-on-chip router design
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Implementation and Evaluation of a Dynamically Routed Processor Operand Network
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Router with centralized buffer for network-on-chip
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Routability of network topologies in FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A case for bufferless routing in on-chip networks
Proceedings of the 36th annual international symposium on Computer architecture
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Low power nanoscale buffer management for network on chip routers
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Evaluating Bufferless Flow Control for On-chip Networks
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
X-Network: An Area-Efficient and High-Performance On-Chip Wormhole-Switching Network
HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
Area and power-efficient innovative congestion-aware Network-on-Chip architecture
Journal of Systems Architecture: the EUROMICRO Journal
A case for heterogeneous on-chip interconnects for CMPs
Proceedings of the 38th annual international symposium on Computer architecture
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees
Proceedings of the 38th annual international symposium on Computer architecture
HPC-Mesh: A Homogeneous Parallel Concentrated Mesh for Fault-Tolerance and Energy Savings
Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Adaptive Backpressure: Efficient buffer management for on-chip networks
ICCD '12 Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
Hi-index | 0.00 |
Packet-switching networks on chip (NoCs) have emerged as a promising paradigm for designing scalable communication infrastructures for future chip many-core processors and complex Systems on Chip (SoCs). However, the quest for high-performance networks has led to very area-consuming and complicated routers. Buffers consume a significant portion of the router area, but their utilization is very low most of the time. This paper presents a low-area and high-performance wormhole-switching NoC named X-Network that is built on a novel PE (Processing Element)-router organization. In X-Network, each router is shared by four PEs and each general PE has access to four directly-connected routers in addition to NEWS (North, East, West, South) connections between neighboring PEs. By sharing routers among PEs, the network reduces the average hop count for a packet thereby reducing the latency and improving the throughput of the network. Our design not only reduces the total number of routers for a given number of PEs, but also offers much more routing flexibility compared to existing mesh-based solutions. Extensive simulation results using both synthetic workloads and SPLASH-2 applications show that X-Network reduces the network latency by up to 50.3% for a system with 64 PEs. The network saturation point is extended by up to approximately 100% using the fully-adaptive routing algorithm. Our proposed hybrid buffer design can improve the performance by additional 22%.