Distributed algorithms and protocols
Distributed algorithms and protocols
Parallel program design: a foundation
Parallel program design: a foundation
A garbage collection algorithm for shared memory parallel processors
International Journal of Parallel Programming
An efficient distributed termination test
Information Processing Letters
A message-optimal algorithm for distributed termination detection
Journal of Parallel and Distributed Computing
Atomic snapshots of shared memory
PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
Fundamentals of Computer Alori
Fundamentals of Computer Alori
A More Efficient Message-Optimal Algorithm for Distributed Termination Detection
IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
An optimal memory allocation for application-specific multiprocessor system-on-chip
Proceedings of the 14th international symposium on Systems synthesis
Termination detection in data-driven parallel computations/applications
Journal of Parallel and Distributed Computing
Distributed-sum termination detection supporting multithreaded execution
Parallel Computing
Mining tree-structured data on multicore systems
Proceedings of the VLDB Endowment
Scalable multi-core LTL model-checking
Proceedings of the 14th international SPIN conference on Model checking software
Hi-index | 0.00 |
In the literature, the problem of global termination detection in parallel systems is usually solved by message passing. In shared-memory systems, this problem can also be solved by using exclusively accessible variables with locking mechanisms. In this paper, we present an algorithm that solves the problem of global termination detection in shared-memory asynchronous multiprocessor systems without using locking. We assume a reasonable computation model in which concurrent reading does not require locking and concurrent writing different values without locking results in an arbitrary one of the values being actually written. For a system of n processors, the algorithm allocates a working space of 2n + 1 bits. The worst case time complexity of the algorithm is $n+2\sqrt n+1,$ which we prove is the lower bound under a reasonable model of computation.