Pipelined broadcast on ethernet switched clusters

Authors:
Pitch Patarasuk;Ahmad Faraj;Xin Yuan
Affiliations:
Department of Computer Science, Florida State University, Tallahassee, FL;Department of Computer Science, Florida State University, Tallahassee, FL;Department of Computer Science, Florida State University, Tallahassee, FL
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 13
Cited 10

Optimum Broadcasting and Personalized Communication in Hypercubes

IEEE Transactions on Computers
Unicast-Based Multicast Communication in Wormhole-Routed Networks

IEEE Transactions on Parallel and Distributed Systems
Scheduling calls for multicasting in tree-networks

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Near-Optimal Broadcast in All-Port Wormhole-Routed Hypercubes Using Error-Correcting Codes

IEEE Transactions on Parallel and Distributed Systems
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Computer Networks

Computer Networks
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Multicast on Irregular Switch-based Networks with Wormhole Routing

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Pipelining Broadcasts on Heterogeneous Platforms

IEEE Transactions on Parallel and Distributed Systems
Broadcast Trees for Heterogeneous Platforms

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Performance Analysis of MPI Collective Operations

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Automatic generation and tuning of MPI collective communication routines

Proceedings of the 19th annual international conference on Supercomputing
Minimum Broadcast Trees

IEEE Transactions on Computers

STAR-MPI: self tuned adaptive routines for MPI collective operations

Proceedings of the 20th annual international conference on Supercomputing
A study of process arrival patterns for MPI collective operations

Proceedings of the 21st annual international conference on Supercomputing
Optimal broadcast for fully connected processor-node networks

Journal of Parallel and Distributed Computing
A study of process arrival patterns for MPI collective operations

International Journal of Parallel Programming
Process cooperation in multiple message broadcast

Parallel Computing
Scheduling for atomic broadcast operation in heterogeneous networks with one port model

The Journal of Supercomputing
Scheduling for atomic broadcast operation in heterogeneous networks with one port model

GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Reining in the outliers in map-reduce clusters using Mantri

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Process cooperation in multiple message broadcast

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPI vs. bittorrent: switching between large-message broadcast algorithms in the presence of bottleneck links

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider unicast-based pipelined broadcast schemes for clusters connected by multiple Ethernet switches. By splitting a large broadcast message into segments and broadcasting the segments in a pipelined fashion, pipelined broadcast may achieve very high performance. We develop algorithms for computing various contention-free broadcast trees on Ethernet switched clusters that are suitable for pipelined broad-cast, and evaluate the schemes through experimentation. The conclusions drawn from our theoretical and experimental study include the following. First, pipelined broadcast can be more effective than other common broadcast schemes including the ones used in the latest versions of MPICH and LAM/MPI when the message size is sufficiently large. Second, contention-free broadcast trees are essential for pipelined broadcast to achieve high performance. Finally, while it is difficult to determine the optimal message segment size for pipelined broadcast, finding one size that gives good performance is relatively easy.