A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling

Authors:
Fengguang Song;Jack Dongarra;Shirley Moore
Affiliations:
EECS Department, University of Tennessee, Knoxville, USA;EECS Department, University of Tennessee, Knoxville, USA and Oak Ridge National Laboratory, Oak Ridge, USA;EECS Department, University of Tennessee, Knoxville, USA
Venue:
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Year:
2009

Citing 10
Cited 0

Accessing nearby copies of replicated objects in a distributed environment

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Multidestination Message Passing in Wormhole k-ary n-cube Networks with Base Routing Conformed Paths

IEEE Transactions on Parallel and Distributed Systems
Compact DAG representation and its dynamic scheduling

Journal of Parallel and Distributed Computing
Bayeux: an architecture for scalable and fault-tolerant wide-area data dissemination

NOSSDAV '01 Proceedings of the 11th international workshop on Network and operating systems support for digital audio and video
Scalable application layer multicast

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
MPICH-G2: a Grid-enabled implementation of the Message Passing Interface

Journal of Parallel and Distributed Computing - Special issue on computational grids
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and

Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
Deadlock-free multicasting in irregular networks using prefix routing

The Journal of Supercomputing
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
IBM POWER6 microarchitecture

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an application-level non-blocking multicast scheme for dynamic DAG scheduling on large-scale distributed-memory systems. The multicast scheme takes into account both network topology and space requirement of routing tables to achieve scalability. Specifically, we prove that the scheme is deadlock-free and takes at most logN steps to complete. The routing table chooses appropriate neighbors to store based on topology IDs and has a small space of O (logN ). Although built upon MPI point-to-point operations, the experimental results show that our scheme is significantly better than the simple flat-tree method and is comparable to vendor's collective MPI operations.