The iSLIP scheduling algorithm for input-queued switches
IEEE/ACM Transactions on Networking (TON)
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
A comparative study of arbitration algorithms for the Alpha 21364 pipelined router
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Spider: A High-Speed Network Interconnect
IEEE Micro
IEEE Transactions on Parallel and Distributed Systems
Symmetric Crossbar Arbiters for VLSI Communication Switches
IEEE Transactions on Parallel and Distributed Systems
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
A near-optimal real-time hardware scheduler for large cardinality crossbar switches
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Flattened butterfly: a cost-efficient topology for high-radix networks
Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
The Performance of Multistage Interconnection Networks for Multiprocessors
IEEE Transactions on Computers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Allocator implementations for network-on-chip routers
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks
IEEE Computer Architecture Letters
Benchmarking modern multiprocessors
Benchmarking modern multiprocessors
Hi-index | 0.00 |
This paper introduces packet chaining, a simple and effective method to increase allocator matching efficiency and hence network performance, particularly suited to networks with short packets and short cycle times. Packet chaining operates by chaining packets destined to the same output together, to reuse the switch connection of a departing packet. This allows an allocator to build up an efficient matching over a number of cycles like incremental allocation, but not limited by packet length. For a 64-node 2D mesh at maximum injection rate and with single-flit packets, packet chaining increases network throughput by 15% compared to a highly-tuned router using a conventional single-iteration separable iSLIP allocator, and outperforms significantly more complex allocators. Specifically, it outperforms multiple-iteration iSLIP allocators and wavefront allocators by 10% and 6% respectively, and gives comparable throughput with an augmenting paths allocator. Packet chaining achieves this performance with a cycle time comparable to a single-iteration separable allocator. Packet chaining also reduces average network latency by 22.5% compared to a single-iteration iSLIP allocator. Finally, packet chaining increases IPC up to 46% (16% average) for application benchmarks because short packets are critical in a typical cache-coherent chip multiprocessor.