Recursive Diagonal Torus: An Interconnection Network for Massively Parallel Computers

Authors:
Yulu Yang;Akira Funahashi;Akiya Jouraku;Hiroaki Nishi;Hideharu Amano;Toshinori Sueyoshi
Affiliations:
Nankai Univ., Tianjin, China;Mie Univ., Mie, Japan;Keio Univ., Yokohama, Japan;Real World Computing Partnership, Ibaraki , Japan;Keio Univ., Yokohama, Japan;Kumamoto Univ., Kumamoto, Japan
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2001

Citing 15
Cited 8

Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
Hypernet: A communication-efficient architecture for constructing massively parallel computers

IEEE Transactions on Computers
The de Bruijn Multiprocessor Network: A Versatile Parallel Processing and Sorting Network for VLSI

IEEE Transactions on Computers
An architecture of a dataflow single chip processor

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Optimal Distance Networks of Low Degree for Parallel Computers

IEEE Transactions on Computers
The massively parallel processing system JUMP-1

The massively parallel processing system JUMP-1
The cube-connected cycles: a versatile network for parallel computation

Communications of the ACM
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
A Survey of Wormhole Routing Techniques in Direct Networks

Computer
The Crossed Cube Architecture for Parallel Computation

IEEE Transactions on Parallel and Distributed Systems
A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks

IEEE Transactions on Parallel and Distributed Systems
Adaptive Routing on the Recursive Diagonal Torus

ISHPC '97 Proceedings of the International Symposium on High Performance Computing
The Preliminary Evaluation of MBP-light with Two Protocol Policies for A Massively Parallel Processor - JUMP-1

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
A Torus Assignment for an Interconnection Network Recursive Diagonal Torus

ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks

Performance Evaluation of Deterministic Routings, Multicasts, and Topologies on RHiNET-2 Cluster

IEEE Transactions on Parallel and Distributed Systems
Dense Gaussian networks: suitable topologies for on-chip multiprocessors

International Journal of Parallel Programming
Algorithmic and explicit determination of the Lovász number for certain circulant graphs

Discrete Applied Mathematics
A deadlock-free routing algorithm using minimum number of virtual channels and application mappings for Hierarchical Torus Network

International Journal of High Performance Computing and Networking
Linear-code multicast on parallel architectures

Advances in Engineering Software
RTTM: a new hierarchical interconnection network for massively parallel computing

HPCA'09 Proceedings of the Second international conference on High Performance Computing and Applications
Efficient computation of the lovász theta function for a class of circulant graphs

WG'04 Proceedings of the 30th international conference on Graph-Theoretic Concepts in Computer Science
Task mapping in rectangular twisted tori

Proceedings of the High Performance Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recursive Diagonal Torus (RDT), a class of interconnection network is proposed for massively parallel computers with up to $2^{16}$ nodes. By making the best use of a recursively structured diagonal mesh (torus) connection, the RDT has a smaller diameter (e.g., it is 11 for $2^{16}$ nodes) with a smaller number of links per node (i.e., 8 links per node) than those of the hypercube. A simple routing algorithm, called vector routing, which is near-optimal and easy to implement is also proposed. Although the congestion on upper rank tori sometimes degrades the performance under the random traffic, the RDT provides much better performance than that of a 2D/3D torus in most cases and, under hot spot traffic, the RDT provides much better performance than that of a 2D/3D/4D torus. The RDT router chip which provides a message multicast for maintaining cache consistency is available. Using the $0.5\mu m$ BiCMOS SOG technology, versatile functions, including hierarchical multicasting, combining acknowledge packets, shooting down/restart mechanism, and time-out/setup mechanisms, work at a 60MHz clock rate.