Scalable Hardware-Based Multicast Trees

Authors:
Salvador Coll;Duato Duato;Fabrizio Petrini;Francisco J. Mora
Affiliations:
Technical University of Valencia, Spain;Technical University of Valencia, Spain;Los Alamos National Laboratory, NM;Technical University of Valencia, Spain
Venue:
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Year:
2003

Citing 11
Cited 2

Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
Ultracomputers: a teraflop before its time

Communications of the ACM
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

ACM Transactions on Computer Systems (TOCS)
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
Improved Utilization and Responsiveness with Gang Scheduling

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
STORM: lightning-fast resource management

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Buffered Coscheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Hardware- and Software-Based Collective Communication on the Quadrics Network

NCA '01 Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'01)
Collective communication patterns on the quadrics network

Performance analysis and grid computing
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing

Exploring pattern-aware routing in generalized fat tree networks

Proceedings of the 23rd international conference on Supercomputing
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an algorithm for implementing optimal hardware-based multicast trees, on networks that provide hardware support for collective communication. Although the proposed methodology can be generalized to a wide class of networks, we apply our methodology to the Quadrics network, a state-of-the-art network that provides hardware-based multicast communication. The proposed mechanism is intended to improve the performance of the collective communication patterns on the network, in those cases where the hardware support can not be directly used, for instance, due to some faulty nodes. This scheme provides significant reduction on multicast latencies compared to the original system primitives, which use multicast trees based on unicast communication. A backtracking algorithm to find the optimal solution to the problem is presented. In addition, a greedy algorithm is presented and shown to provide near optimal solutions. Finally, our experimental results show the good performance and scalability of the proposed multicast tree in comparison to the traditional unicast-based multicast trees. Our multicast mechanism doubles barrier synchronization and broadcasts performance when compared to the production-level MPI library.