On the Design of a High-Performance Adaptive Router for CC-NUMA Multiprocessors

Authors:
Valentín Puente;José-Ángel Gregorio;Ramón Beivide;Cruz Izu
Affiliations:
-;IEEE Computer Society;IEEE Computer Society;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2003

Citing 20
Cited 3

Performance Analysis of k-ary n-cube Interconnection Networks

IEEE Transactions on Computers
The chaos router: a practical application of randomization in network routing

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches

IEEE Transactions on Computers
The SP2 high-performance switch

IBM Systems Journal
Pipelined memory shared buffer for VLSI switches

SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A Necessary and Sufficient Condition for Deadlock-Free Routing in Cut-Through and Store-and-Forward Networks

IEEE Transactions on Parallel and Distributed Systems
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
HIPIQS: A High-Performance Switch Architecture Using Input Queuing

IEEE Transactions on Parallel and Distributed Systems
The adaptive bubble router

Journal of Parallel and Distributed Computing
Spider: A High-Speed Network Interconnect

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Limits on Interconnection Network Performance

IEEE Transactions on Parallel and Distributed Systems
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
Architecture and Implementation of Vulcan

Proceedings of the 8th International Symposium on Parallel Processing
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
LAPSES: A Recipe for High Performance Adaptive Router Design

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
An Efficient Randomized Algorithm for Input-Queued Switch Scheduling

HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
A Flow Control Mechanism to Avoid Message Deadlock in k-ary n-cube Networks

HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
SICOSYS: an integrated framework for studying interconnection network performance in multiprocessor systems

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing

A first glance at Kilo-instruction based multiprocessors

Proceedings of the 1st conference on Computing frontiers
Evaluating kilo-instruction multiprocessors

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Evaluation of interconnection network performance under heavy non-uniform loads

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This work presents the design and evaluation of an adaptive packet router aimed at supporting CC-NUMA traffic. We exploit a simple and efficient packet injection mechanism to avoid deadlock, which leads to a fully adaptive routing by employing only three virtual channels. In addition, we selectively use output buffers for implementing the most utilized virtual paths in order to reduce head-of-line blocking. The careful implementation of these features has resulted in a good trade off between network performance and hardware cost. The outcome of this research is a High-Performance Adaptive Router (HPAR), which adequately balances the needs of parallel applications: minimal network latency at low loads and high throughput at heavy loads. The paper includes an evaluation process in which HPAR is compared with other adaptive routers using FIFO input buffering, with or without additional virtual channels to reduce head-of-line blocking. This evaluation contemplates both the VLSI costs of each router and their performance under synthetic and real application workloads. To make the comparison fair, all the routers use the same efficient deadlock avoidance mechanism. In all the experiments, HPAR exhibited the best response among all the routers tested. The throughput gains ranged from 10 percent to 40 percent in respect to its most direct rival, which employs more hardware resources. Other results shown that HPAR achieves up to 83 percent of its theoretical maximum throughput under random traffic and up to 70 percent when running real applications. Moreover, the observed packet latencies were comparable to those exhibited by simpler routers. Therefore, HPAR can be considered as a suitable candidate to implement packet interchange in next generations of CC-NUMA multiprocessors.