Robust network supercomputing without centralized control

Authors:
Seda Davtyan;Kishori M. Konwar;Alexander A. Shvartsman
Affiliations:
Department of Computer Science & Engineering, University of Connecticut, Storrs, CT;Department of Immunology and Microbiology, University of British Columbia, Vancouver, Canada;Department of Computer Science & Engineering, University of Connecticut, Storrs, CT
Venue:
OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Year:
2011

Citing 10
Cited 1

Randomized algorithms

Randomized algorithms
Performing Work Efficiently in the Presence of Faults

SIAM Journal on Computing
Fault-Tolerant Parallel Computation

Fault-Tolerant Parallel Computation
Randomization helps to perform independent tasks reliably

Random Structures & Algorithms
Optimal decision strategies in Byzantine environments

Journal of Parallel and Distributed Computing
Reliably Executing Tasks in the Presence of Untrusted Entities

SRDS '06 Proceedings of the 25th IEEE Symposium on Reliable Distributed Systems
Do-All Computing in Distributed Systems

Do-All Computing in Distributed Systems
Toward Maximizing the Quality of Results of Dependent Tasks Computed Unreliably

Theory of Computing Systems
On the bit communication complexity of randomized rumor spreading

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Robust network supercomputing with malicious processes

DISC'06 Proceedings of the 20th international conference on Distributed Computing

Brief announcement: decentralized network supercomputing in the presence of malicious and crash-prone workers

PODC '12 Proceedings of the 2012 ACM symposium on Principles of distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Internet supercomputing provides means for harnessing the power of a vast number of interconnected computers. With this come the challenges of marshaling distributed resources and dealing with failures. Traditional centralized approaches employ a master processor and many worker processors that execute a collection of tasks on behalf of the master. Despite the simplicity and advantages of centralized schemes, the master processor is a performance bottleneck and a single point of failure. Additionally, a phenomenon of increasing concern is that workers may return incorrect results, e.g., due to unintended failures, over-clocked processors, or due to workers claiming to have performed work to obtain a high rank in the system. This paper develops an original approach that eliminates the master and instead uses a decentralized algorithm, where workers cooperate in performing tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. We present a randomized synchronous algorithm for n processors and t tasks (t≥n) achieving time complexity $\Theta(\frac{t}{n} \log n)$ and work Θ(tlogn). It is shown that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is Θ(n logn), and the bit complexity is O(tn log3n). Simulations illustrate the behavior of the algorithm under realistic assumptions.