Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Hierarchical Interconnection Networks for Multicomputer Systems
IEEE Transactions on Computers
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Digital systems engineering
The cube-connected cycles: a versatile network for parallel computation
Communications of the ACM
Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Gemini: An Optical Interconnection Network for Parallel Processing
IEEE Transactions on Parallel and Distributed Systems
Scalable Opto-Electronic Network (SOENet)
HOTI '02 Proceedings of the 10th Symposium on High Performance Interconnects HOT Interconnects
GOAL: a load-balanced adaptive routing algorithm for torus networks
Proceedings of the 30th annual international symposium on Computer architecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Adaptive channel queue routing on k-ary n-cubes
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Microarchitecture of a High-Radix Router
Proceedings of the 32nd annual international symposium on Computer Architecture
Topology Optimization of Interconnection Networks
IEEE Computer Architecture Letters
The BlackWidow High-Radix Clos Network
Proceedings of the 33rd annual international symposium on Computer Architecture
Adaptive routing in high-radix clos network
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Flattened butterfly: a cost-efficient topology for high-radix networks
Proceedings of the 34th annual international symposium on Computer architecture
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Indirect adaptive routing on large scale interconnection networks
Proceedings of the 36th annual international symposium on Computer architecture
Firefly: illuminating future network-on-chip with nanophotonics
Proceedings of the 36th annual international symposium on Computer architecture
HyperX: topology, routing, and packaging of efficient large-scale networks
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Low-cost router microarchitecture for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A case for dynamic frequency tuning in on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Next generation on-chip networks: what kind of congestion control do we need?
Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
A first approach to king topologies for on-chip networks
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Scalable and cost-effective interconnection of data-center servers using dual server ports
IEEE/ACM Transactions on Networking (TON)
RAFT: A router architecture with frequency tuning for on-chip networks
Journal of Parallel and Distributed Computing
A learning-based approach to the automated design of MPSoC networks
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
A case for heterogeneous on-chip interconnects for CMPs
Proceedings of the 38th annual international symposium on Computer architecture
The role of optics in future high radix switch design
Proceedings of the 38th annual international symposium on Computer architecture
Avoiding hot-spots on two-level direct networks
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Improving communication performance in dense linear algebra via topology aware collectives
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Proceedings of the 9th conference on Computing Frontiers
Jellyfish: networking data centers randomly
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Exploiting communication and packaging locality for cost-effective large scale networks
Proceedings of the 26th ACM international conference on Supercomputing
A case for random shortcut topologies for HPC interconnects
Proceedings of the 39th Annual International Symposium on Computer Architecture
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Design and evaluation of Mesh-of-Tree based Network-on-Chip using virtual channel router
Microprocessors & Microsystems
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Looking under the hood of the IBM blue gene/Q network
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cray cascade: a scalable HPC system based on a Dragonfly network
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Collectives on two-tier direct networks
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
International Journal of Distributed Systems and Technologies
The power 775 architecture at scale
Proceedings of the 27th international ACM conference on International conference on supercomputing
Randomizing task placement does not randomize traffic (enough)
Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip
Global misrouting policies in two-level hierarchical networks
Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip
Obtaining the optimal configuration of high-radix Combined switches
Journal of Parallel and Distributed Computing
Scalable high-radix router microarchitecture using a network switch organization
ACM Transactions on Architecture and Code Optimization (TACO)
BBQ: a straightforward queuing scheme to reduce hol-blocking in high-performance hybrid networks
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Network interface design for low latency request-response protocols
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Dahu: commodity switches for direct connect data center networks
ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Optimal networks from error correcting codes
ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Performance implications of remote-only load balancing under adversarial traffic in Dragonflies
Proceedings of the 8th International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
A reconfigurable, regular-topology cluster/datacenter network using commodity optical switches
Future Generation Computer Systems
Hi-index | 0.00 |
Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. High-radix networks, however, require longer cables than their low-radix counterparts. Because cables dominate network cost, the number of cables, and particularly the number of long, global cables should be minimized to realize an efficient network. In this paper, we introduce the dragonfly topology which uses a group of high-radix routers as a virtual router to increase the effective radix of the network. With this organization, each minimally routed packet traverses at most one global channel. By reducing global channels, a dragonfly reduces cost by 20% compared to a flattened butterfly and by 52% compared to a folded Clos network in configurations with ≥ 16K nodes.We also introduce two new variants of global adaptive routing that enable load-balanced routing in the dragonfly. Each router in a dragonfly must make an adaptive routing decision based on the state of a global channel connected to a different router. Because of the indirect nature of this routing decision, conventional adaptive routing algorithms give degraded performance. We introduce the use of selective virtual-channel discrimination and the use of credit round-trip latency to both sense and signal channel congestion. The combination of these two methods gives throughput and latency that approaches that of an ideal adaptive routing algorithm.