Packet chaining: efficient single-cycle allocation for on-chip networks

Authors:
George Michelogiannakis;Nan Jiang;Daniel Becker;William J. Dally
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2011

Citing 22
Cited 0

The iSLIP scheduling algorithm for input-queued switches

IEEE/ACM Transactions on Networking (TON)
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
A comparative study of arbitration algorithms for the Alpha 21364 pipelined router

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Spider: A High-Speed Network Interconnect

IEEE Micro
Designing and Implementing a Fast Crossbar Scheduler

IEEE Micro
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
Symmetric Crossbar Arbiters for VLSI Communication Switches

IEEE Transactions on Parallel and Distributed Systems
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks

Proceedings of the 31st annual international symposium on Computer architecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
A near-optimal real-time hardware scheduler for large cardinality crossbar switches

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Flattened butterfly: a cost-efficient topology for high-radix networks

Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric

Proceedings of the 34th annual international symposium on Computer architecture
Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects

HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
The Performance of Multistage Interconnection Networks for Multiprocessors

IEEE Transactions on Computers
Token flow control

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Allocator implementations for network-on-chip routers

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An analysis of on-chip interconnection networks for large-scale chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks

IEEE Computer Architecture Letters
Benchmarking modern multiprocessors

Benchmarking modern multiprocessors

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces packet chaining, a simple and effective method to increase allocator matching efficiency and hence network performance, particularly suited to networks with short packets and short cycle times. Packet chaining operates by chaining packets destined to the same output together, to reuse the switch connection of a departing packet. This allows an allocator to build up an efficient matching over a number of cycles like incremental allocation, but not limited by packet length. For a 64-node 2D mesh at maximum injection rate and with single-flit packets, packet chaining increases network throughput by 15% compared to a highly-tuned router using a conventional single-iteration separable iSLIP allocator, and outperforms significantly more complex allocators. Specifically, it outperforms multiple-iteration iSLIP allocators and wavefront allocators by 10% and 6% respectively, and gives comparable throughput with an augmenting paths allocator. Packet chaining achieves this performance with a cycle time comparable to a single-iteration separable allocator. Packet chaining also reduces average network latency by 22.5% compared to a single-iteration iSLIP allocator. Finally, packet chaining increases IPC up to 46% (16% average) for application benchmarks because short packets are critical in a typical cache-coherent chip multiprocessor.