A comparative study of arbitration algorithms for the Alpha 21364 pipelined router

Authors:
Shubhendu S. Mukherjee;Federico Silla;Peter Bannon;Joel Emer;Steve Lang;David Webb
Affiliations:
Intel Corporation, Shrewsbury, MA;Universidad Politecnica de Valencia, Valencia, Spain;Hewlett-Packard, Shrewsbury, MA;Intel Corporation, Shrewsbury, MA;Intel Corporation, Shrewsbury, MA;Hewlett-Packard, Shrewsbury, MA
Venue:
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Year:
2002

Citing 28
Cited 7

Analysis of interconnection networks with different arbiter designs

Journal of Parallel and Distributed Computing
A Case for Direct-Mapped Caches

Computer
High speed switch scheduling for local area networks

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks

IEEE Transactions on Parallel and Distributed Systems
METRO: a router architecture for high-performance, short-haul routing networks

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Scheduling algorithms for input-queued cell switches

Scheduling algorithms for input-queued cell switches
Performance of database workloads on shared-memory systems with out-of-order processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The iSLIP scheduling algorithm for input-queued switches

IEEE/ACM Transactions on Networking (TON)
Virtual-channel flow control

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A new switch chip for IBM RS/6000 SP systems

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Architecture and design of AlphaServer GS320

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
Asim: A Performance Model Framework

Computer
Spider: A High-Speed Network Interconnect

IEEE Micro
Efficient Randomized Algorithms for Input-Queued Switch Scheduling

IEEE Micro
An Implementable Parallel Scheduler for Input-Queued Switches

IEEE Micro
The Alpha 21364 Network Architecture

IEEE Micro
The Sun Fireplane Interconnect

IEEE Micro
The Use of Feedback in Multiprocessors and Its Application to Tree Saturation Control

IEEE Transactions on Parallel and Distributed Systems
Symmetric Crossbar Arbiters for VLSI Communication Switches

IEEE Transactions on Parallel and Distributed Systems
DRIL: Dynamically Reduced Message Injection Limitation Mechanism for Wormhole Networks

ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Global Reactive Congestion Control in Multicomputer Networks

HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
A High-Performance OC-12/OC-48 Queue Design Prototype for Input-buffered ATM Switches

INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Self-Tuned Congestion Control for Multiprocessor Networks

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A Delay Model and Speculative Architecture for Pipelined Routers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Worst-case Traffic for Oblivious Routing Functions

IEEE Computer Architecture Letters
POWER4 system microarchitecture

IBM Journal of Research and Development

Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols

IEEE Transactions on Parallel and Distributed Systems
High Performance Matrix Multiplication on Many Cores

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Allocator implementations for network-on-chip routers

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Light speed arbitration and flow control for nanophotonic interconnects

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A low-latency modular switch for CMP systems

Microprocessors & Microsystems
Packet chaining: efficient single-cycle allocation for on-chip networks

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic-priority arbiter and multiplexer soft macros for on-chip networks switches

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interconnection networks usually consist of a fabric of interconnected routers, which receive packets arriving at their input ports and forward them to appropriate output ports. Unfortunately, network packets moving through these routers are often delayed due to conflicting demand for resources, such as output ports or buffer space. Hence, routers typically employ arbiters that resolve conflicting resource demands to maximize the number of matches between packets waiting at input ports and free output ports. Efficient design and implementation of the algorithm running on these arbiters is critical to maximize network performance.This paper proposes a new arbitration algorithm called SPAA (Simple Pipelined Arbitration Algorithm), which is implemented in the Alpha 21364 processor's on-chip router pipeline. Simulation results show that SPAA significantly outperforms two earlier well-known arbitration algorithms: PIM (Parallel Iterative Matching) and WFA (Wave-Front Arbiter) implemented in the SGI Spider switch. SPAA outperforms PIM and WFA because SPAA exhibits matching capabilities similar to PIM and WFA under realistic conditions when many output ports are busy, incurs fewer clock cycles to perform the arbitration, and can be pipelined effectively. Additionally, we propose a new prioritization policy called the Rotary Rule, which prevents the network's adverse performance degradation from saturation at high network loads by prioritizing packets already in the network over new packets generated by caches or memory.