Technology-Driven, Highly-Scalable Dragonfly Topology

Authors:
John Kim;Wiliam J. Dally;Steve Scott;Dennis Abts
Affiliations:
-;-;-;-
Venue:
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Year:
2008

Citing 23
Cited 36

Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
Hierarchical Interconnection Networks for Multicomputer Systems

IEEE Transactions on Computers
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Digital systems engineering

Digital systems engineering
The cube-connected cycles: a versatile network for parallel computation

Communications of the ACM
Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes

IEEE Transactions on Parallel and Distributed Systems
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
Gemini: An Optical Interconnection Network for Parallel Processing

IEEE Transactions on Parallel and Distributed Systems
Scalable Opto-Electronic Network (SOENet)

HOTI '02 Proceedings of the 10th Symposium on High Performance Interconnects HOT Interconnects
GOAL: a load-balanced adaptive routing algorithm for torus networks

Proceedings of the 30th annual international symposium on Computer architecture
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Adaptive channel queue routing on k-ary n-cubes

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Design of a High-Speed Optical Interconnect for Scalable Shared-Memory Multiprocessors

IEEE Micro
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Microarchitecture of a High-Radix Router

Proceedings of the 32nd annual international symposium on Computer Architecture
Topology Optimization of Interconnection Networks

IEEE Computer Architecture Letters
The BlackWidow High-Radix Clos Network

Proceedings of the 33rd annual international symposium on Computer Architecture
Adaptive routing in high-radix clos network

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Flattened butterfly: a cost-efficient topology for high-radix networks

Proceedings of the 34th annual international symposium on Computer architecture
Building Ultralow-Latency Interconnection Networks Using Photonic Integration

IEEE Micro
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
The Cray BlackWidow: a highly scalable vector multiprocessor

Proceedings of the 2007 ACM/IEEE conference on Supercomputing

Indirect adaptive routing on large scale interconnection networks

Proceedings of the 36th annual international symposium on Computer architecture
Firefly: illuminating future network-on-chip with nanophotonics

Proceedings of the 36th annual international symposium on Computer architecture
HyperX: topology, routing, and packaging of efficient large-scale networks

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Low-cost router microarchitecture for on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A case for dynamic frequency tuning in on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Next generation on-chip networks: what kind of congestion control do we need?

Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
A first approach to king topologies for on-chip networks

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Scalable and cost-effective interconnection of data-center servers using dual server ports

IEEE/ACM Transactions on Networking (TON)
RAFT: A router architecture with frequency tuning for on-chip networks

Journal of Parallel and Distributed Computing
A learning-based approach to the automated design of MPSoC networks

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
A case for heterogeneous on-chip interconnects for CMPs

Proceedings of the 38th annual international symposium on Computer architecture
The role of optics in future high radix switch design

Proceedings of the 38th annual international symposium on Computer architecture
Avoiding hot-spots on two-level direct networks

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Improving communication performance in dense linear algebra via topology aware collectives

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A reconfigurable optical/electrical interconnect architecture for large-scale clusters and datacenters

Proceedings of the 9th conference on Computing Frontiers
Jellyfish: networking data centers randomly

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Exploiting communication and packaging locality for cost-effective large scale networks

Proceedings of the 26th ACM international conference on Supercomputing
A case for random shortcut topologies for HPC interconnects

Proceedings of the 39th Annual International Symposium on Computer Architecture
On-chip networks from a networking perspective: congestion and scalability in many-core interconnects

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Design and evaluation of Mesh-of-Tree based Network-on-Chip using virtual channel router

Microprocessors & Microsystems
On-chip networks from a networking perspective: congestion and scalability in many-core interconnects

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Looking under the hood of the IBM blue gene/Q network

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cray cascade: a scalable HPC system based on a Dragonfly network

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Collectives on two-tier direct networks

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
On the Path to Exascale

International Journal of Distributed Systems and Technologies
The power 775 architecture at scale

Proceedings of the 27th international ACM conference on International conference on supercomputing
Randomizing task placement does not randomize traffic (enough)

Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip
Global misrouting policies in two-level hierarchical networks

Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip
Obtaining the optimal configuration of high-radix Combined switches

Journal of Parallel and Distributed Computing
Scalable high-radix router microarchitecture using a network switch organization

ACM Transactions on Architecture and Code Optimization (TACO)
BBQ: a straightforward queuing scheme to reduce hol-blocking in high-performance hybrid networks

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Network interface design for low latency request-response protocols

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Dahu: commodity switches for direct connect data center networks

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Optimal networks from error correcting codes

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Performance implications of remote-only load balancing under adversarial traffic in Dragonflies

Proceedings of the 8th International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
A reconfigurable, regular-topology cluster/datacenter network using commodity optical switches

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. High-radix networks, however, require longer cables than their low-radix counterparts. Because cables dominate network cost, the number of cables, and particularly the number of long, global cables should be minimized to realize an efficient network. In this paper, we introduce the dragonfly topology which uses a group of high-radix routers as a virtual router to increase the effective radix of the network. With this organization, each minimally routed packet traverses at most one global channel. By reducing global channels, a dragonfly reduces cost by 20% compared to a flattened butterfly and by 52% compared to a folded Clos network in configurations with ≥ 16K nodes.We also introduce two new variants of global adaptive routing that enable load-balanced routing in the dragonfly. Each router in a dragonfly must make an adaptive routing decision based on the state of a global channel connected to a different router. Because of the indirect nature of this routing decision, conventional adaptive routing algorithms give degraded performance. We introduce the use of selective virtual-channel discrimination and the use of credit round-trip latency to both sense and signal channel congestion. The combination of these two methods gives throughput and latency that approaches that of an ideal adaptive routing algorithm.