Approaching Ideal NoC Latency with Pre-Configured Routes

Authors:
George Michelogiannakis;Dionisios Pnevmatikatos;Manolis Katevenis
Affiliations:
Institute of Computer Science, Foundation for Research & Technology-Hellas (FORTH), Greece;Institute of Computer Science, Foundation for Research & Technology-Hellas (FORTH), Greece;Institute of Computer Science, Foundation for Research & Technology-Hellas (FORTH), Greece
Venue:
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Year:
2007

Citing 9
Cited 9

High performance communications in processor networks

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A survey of wormhole routing techniques in direct networks

Readings in computer architecture
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
DyAD: smart routing for networks-on-chip

Proceedings of the 41st annual Design Automation Conference
Low-Latency Virtual-Channel Routers for On-Chip Networks

Proceedings of the 31st annual international symposium on Computer architecture
Thermal Modeling, Characterization and Management of On-Chip Networks

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An Asynchronous NOC Architecture Providing Low Latency Service and Its Multi-Level Design Framework

ASYNC '05 Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems
A low latency router supporting adaptivity for on-chip interconnects

Proceedings of the 42nd annual Design Automation Conference
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks

Proceedings of the 33rd annual international symposium on Computer Architecture

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip

NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
ODOR: a microresonator-based high-performance low-cost router for optical networks-on-Chip

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
A case for bufferless routing in on-chip networks

Proceedings of the 36th annual international symposium on Computer architecture
Router designs for elastic buffer on-chip networks

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A practical low-latency router architecture with wing channel for on-chip network

Microprocessors & Microsystems
An improvement of router throughput for on-chip networks using on-the-fly virtual channel allocation

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
A Torus-Based Hierarchical Optical-Electronic Network-on-Chip for Multiprocessor System-on-Chip

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Viper: virtual pipelines for enhanced reliability

Proceedings of the 39th Annual International Symposium on Computer Architecture
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In multi-core ASICs, processors and other compute engines need to communicate with memory blocks and other cores with latency as close as possible to the ideal of a direct buffered wire. However, current state of the art networks-on- chip (NoCs) suffer, at best, latency of one clock cycle per hop. We investigate the design of a NoC that offers close to the ideal latency in some preferred, run-time configurable paths. Processors and other compute engines may perform network reconfiguration to guarantee low latency over different sets of paths as needed. Flits in non-preferred paths are given lower priority than flits in preferred ones, and suffer a delay of one clock cycle per hop when there is no contention. To achieve our goal, we use the "madpostman" [5] technique: every incoming flit is eagerly (i.e. speculatively) forwarded to the input's preferred output, if any. This is accomplished with the mere delay of a single pre-enabled tri-state driver. We later check if that decision was correct, and if not, we forward the flit to the proper output. Incorrectly forwarded flits are classified as dead and eliminated in later hops. We use a 2D mesh topology tailored for processor-memory communication, and a modified version of XY routing that remains deadlock-free. Our evaluation shows that, for the preferred paths, our approach offers typical latency around 500 ps versus 1500 ps for a full clock cycle or 135 ps for an ideal direct connect, in a 130 nm technology; non-preferred paths suffer a one clock cycle delay per hop, similar to that of other approaches. Performance gains are significant and can be proven greatly useful in other application domains as well.