Termination detection in data-driven parallel computations/applications

Authors:
Ashfaq A. Khokhar;Susanne E. Hambrusch;Erturk Kocalar
Affiliations:
Department of Electrical Engineering and Computer Science, University of Illinois at Chicago, Chicago, IL;Department of Computer Sciences, Purdue University, West Lafayette, IN;WebTV Networks, Mountain View, CA
Venue:
Journal of Parallel and Distributed Computing
Year:
2003

Citing 16
Cited 6

Ring based termination detection algorithm for distributed computations

Information Processing Letters
Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Global quiescence detection based on credit distribution and recovery

Information Processing Letters
A message-optimal algorithm for distributed termination detection

Journal of Parallel and Distributed Computing
Distributed termination detection with counters

Information Processing Letters
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Introduction to distributed algorithms

Introduction to distributed algorithms
High performance synchronization algorithms for multiprogrammed multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting termination by weight-throwing in a faulty distributed system

Journal of Parallel and Distributed Computing
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Efficient Termination Detection for Loosely Synchronous Applications in Multicomputers

IEEE Transactions on Parallel and Distributed Systems
An Optimal Algorithm for Global Termination Detection in Shared-Memory Asynchronous Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
A scalable parallel cell-projection volume rendering algorithm for three-dimensional unstructured data

PRS '97 Proceedings of the IEEE symposium on Parallel rendering
A taxonomy of distributed termination detection algorithms

Journal of Systems and Software
Scalable parallel formulations of the barnes-hut method for n-body simulations

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Maintaining Spatial Data Sets in Distributed-Memory Machines

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing

Efficient detection of a locally stable predicate in a distributed system

Journal of Parallel and Distributed Computing
Tiered Algorithm for Distributed Process Quiescence and Termination Detection

IEEE Transactions on Parallel and Distributed Systems
On termination detection in crash-prone distributed systems with failure detectors

Journal of Parallel and Distributed Computing
Safe termination detection in an asynchronous distributed system when processes may crash and recover

Theoretical Computer Science
Safe termination detection in an asynchronous distributed system when processes may crash and recover

OPODIS'06 Proceedings of the 10th international conference on Principles of Distributed Systems
Efficient reduction for wait-free termination detection in a crash-prone distributed system

DISC'05 Proceedings of the 19th international conference on Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-performance computing applications with data-driven communication and computation characteristics require synchronization routines in the form of eureka, barrier, or termination synchronization. In this paper, we consider termination synchronization for two different execution models, the AP and the APS model. In the AP model, processors are either active or passive and a passive processor can be made active by another active processor. In the APS model, processors can also be in a server state. A passive processor entering the server state does not become active again. In addition, a server processor cannot change the status of other processors. We describe and analyze solutions for both models and present experimental work highlighting the differences between the models. We show that in almost all situations the use of an AP algorithm to detect termination in an APS environment will result in loss of performance. Our experimental work on the Cray T3E provides insight into where and why this performance loss occurs.