Parallel processing on networks of workstations: a fault-tolerant, high performance approach

Authors:
P. Dasgupta;Z. M. Kedem;M. O. Rabin
Affiliations:
-;-;-
Venue:
ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
Year:
1995

Citing 0
Cited 15

“Dynamic-fault-prone BSP”: a paradigm for robust computations in changing environments

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Towards practical deteministic write-all algorithms

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
A Framework for Automatic Adaptation of Tunable Distributed Applications

Cluster Computing
Experiments with the CHIME Parallel Processing System

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Fault-Tolerant Parallel Applications Using Queues and Actions

ICPP '97 Proceedings of the international Conference on Parallel Processing
The Complexity of Synchronous Iterative Do-All with Crashes

DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Metacomputing with MILAN

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
A work-optimal deterministic algorithm for the asynchronous certified write-all problem

Proceedings of the twenty-second annual symposium on Principles of distributed computing
Writing-all deterministically and optimally using a non-trivial number of asynchronous processors

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
The complexity of synchronous iterative Do-All with crashes

Distributed Computing
On Honey Bees and Dynamic Server Allocation in Internet Hosting Centers

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
A tight analysis and near-optimal instances of the algorithm of Anderson and Woll

Theoretical Computer Science
The Do-All problem with Byzantine processor failures

Theoretical Computer Science - Foundations of software science and computation structures
Parallel processing with windows NT networks

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Writing-all deterministically and optimally using a nontrivial number of asynchronous processors

ACM Transactions on Algorithms (TALG)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf-workstations that actually deliver and even surpass, the power and reliability of supercomputers. Using completely novel techniques: eager scheduling, evasive memory layouts and dispersed data management it is possible to build an execution environment for parallel programs on workstation networks. These techniques were originally developed in a theoretical framework for an abstract machine which models a shared memory asynchronous multiprocessor. The network of workstations platform presents an inherently asynchronous environment for the execution of our parallel program. This gives rise to substantial problems of correctness of the computation and of proper automatic load balancing of the work amongst the processors, so that a slow processor will not hold up the total computation. A limiting case of asynchrony is when a processor becomes infinitely slow, i.e. fails. Our methodology copes with all these problems, as well as with memory failures. An interesting feature of this system is that it is neither a fault-tolerant system extended for parallel processing nor is it parallel processing system extended for fault tolerance. The same novel mechanisms ensure both properties.