An efficient delay-optimal distributed termination detection algorithm

Authors:
Nihar R. Mahapatra;Shantanu Dutt
Affiliations:
Department of Electrical & Computer Engineering, Michigan State University, East Lansing, MI 48824-1226, USA;Department of Electrical & Computer Engineering, University of Illinois at Chicago, 851 South Morgan Street (M/C 154), Chicago, IL 60607-7053, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2007

Citing 26
Cited 3

How processes learn

Distributed Computing
Data communication in hypercubes

Journal of Parallel and Distributed Computing
A message-optimal algorithm for distributed termination detection

Journal of Parallel and Distributed Computing
Performance Analysis of k-ary n-cube Interconnection Networks

IEEE Transactions on Computers
The derivation of distributed termination detection algorithms from garbage collection schemes

ACM Transactions on Programming Languages and Systems (TOPLAS)
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Efficient algorithms for distributed snapshots and global virtual time approximation

Journal of Parallel and Distributed Computing - Special issue on parallel and discrete event simulation
Highly parallel computing (2nd ed.)

Highly parallel computing (2nd ed.)
A systematic approach to host interface design for high-speed networks

Computer
High-performance I/O for massively parallel computers: problems and prospects

Computer
Analyzing scalability of parallel algorithms and architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Scalable load balancing strategies for parallel A* algorithms

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Distributed termination detection with roughly synchronized clocks

Information Processing Letters
Detecting termination by weight-throwing in a faulty distributed system

Journal of Parallel and Distributed Computing
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Distributed termination detection in a mobile wireless network

ACM-SE 36 Proceedings of the 36th annual Southeast regional conference
A taxonomy of distributed termination detection algorithms

Journal of Systems and Software
Asynchronous distributed simulation via a sequence of parallel computations

Communications of the ACM - Special issue on simulation modeling and statistical computing
Development of a Class of Distributed Termination Detection Algorithms

IEEE Transactions on Knowledge and Data Engineering
A More Efficient Message-Optimal Algorithm for Distributed Termination Detection

IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
A Fault-Tolerant Distributed Algorithm for Termination Detection Using Roughly Synchronized Clocks

ICPADS '97 Proceedings of the 1997 International Conference on Parallel and Distributed Systems
Detecting termination of distributed computations using markers

PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
A Termination Detection Protocol for Use in Mobile Ad Hoc Networks

Automated Software Engineering
Graph Theory with Applications to Engineering and Computer Science (Prentice Hall Series in Automatic Computation)

Graph Theory with Applications to Engineering and Computer Science (Prentice Hall Series in Automatic Computation)
Parallel A* algorithms and their performance on hypercube multiprocessors

IPPS '93 Proceedings of the 1993 Seventh International Parallel Processing Symposium
An (N-1)-resilient algorithm for distributed termination detection

SPDP '92 Proceedings of the 1992 Fourth IEEE Symposium on Parallel and Distributed Processing

On termination detection in crash-prone distributed systems with failure detectors

Journal of Parallel and Distributed Computing
Efficient reduction for wait-free termination detection in a crash-prone distributed system

DISC'05 Proceedings of the 19th international conference on Distributed Computing
An unstructured termination detection algorithm using gossip in cloud computing environments

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed termination detection is a fundamental problem in parallel and distributed computing and numerous schemes with different performance characteristics have been proposed. These schemes, while being efficient with regard to one performance metric, prove to be inefficient in terms of other metrics. A significant drawback shared by all previous methods is that, on most popular topologies, they take Ω(P) time to detect and signal termination after its actual occurrence, where P is the total number of processing elements. Detection delay is arguably the most important metric to optimize, since it is directly related to the amount of idling of computing resources and to the delay in the utilization of results of the underlying computation. In this paper, we present a novel termination detection algorithm that is simultaneously optimal or near-optimal with respect to all relevant performance measures on any topology. In particular, our algorithm has a best-case detection delay of Θ(1) and a finite optimal worst-case detection delay on any topology equal in order terms to the time for an optimal one-to-all broadcast on that topology (which we accurately characterize for an arbitrary topology). On k-ary n-cube tori and meshes, the worst-case delay is Θ(D), where D is the diameter of the target topology. Further, our algorithm has message and computational complexities of Θ(MD+P) in the worst case and, for most applications, ΘM+P) in the average case-the same as other message-efficient algorithms, and an optimal space complexity of Θ(P), where M is the total number of messages used by the underlying computation. We also give a scheme using counters that greatly reduces the constant associated with the average message and computational complexities, but does not suffer from the counter-overflow problems of other schemes. Finally, unlike some previous schemes, our algorithm does not rely on first-in first-out (FIFO) ordering for message communication to work correctly.