Robust network supercomputing with malicious processes

Authors:
Kishori M. Konwar;Sanguthevar Rajasekaran;Alexander A. Shvartsman
Affiliations:
Department of Computer Science and Engineering, University of Connecticut, Storrs, CT;Department of Computer Science and Engineering, University of Connecticut, Storrs, CT;Department of Computer Science and Engineering, University of Connecticut, Storrs, CT
Venue:
DISC'06 Proceedings of the 20th international conference on Distributed Computing
Year:
2006

Citing 14
Cited 3

Combining tentative and definite executions for very fast dependable parallel computing

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Performing work efficiently in the presence of faults

PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
On the complexity of certified write-all algorithms

Journal of Algorithms
Probabilistic recurrence relations

Journal of the ACM (JACM)
Time-optimal message-efficient work performance in the presence of faults

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
SETI@HOME—massively distributed computing for SETI

Computing in Science and Engineering
Performing Tasks on Restartable Message-Passing Processors

WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication

WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
On the Quality of Service of Failure Detectors

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
An optimal algorithm for Monte Carlo estimation

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Probabilistic Analysis of a Group Failure Detection Protocol

WORDS '99 Proceedings of the Fourth International Workshop on Object-Oriented Real-Time Dependable Systems
Toward Maximizing the Quality of Results of Dependent Tasks Computed Unreliably

Theory of Computing Systems
Reliably executing tasks in the presence of malicious processors

DISC'05 Proceedings of the 19th international conference on Distributed Computing

Robust network supercomputing without centralized control

Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Robust network supercomputing without centralized control

OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Brief announcement: decentralized network supercomputing in the presence of malicious and crash-prone workers

PODC '12 Proceedings of the 2012 ACM symposium on Principles of distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Internet supercomputing is becoming a powerful tool for harnessing massive amounts of computational resources. However in typical master-worker settings the reliability of computation crucially depends on the ability of the master to depend on the computation performed by the workers. Fernandez, Georgiou, Lopez, and Santos [12,13] considered a system consisting of a master process and a collection of worker processes that can execute tasks on behalf of the master and that may act maliciously by deliberately returning fallacious results. The master decides on the correctness of the results by assigning the same task to several workers. The master is charged one work unit for each task performed by a worker. The goal is to design an algorithm that enables the master to determine the correct result with high probability, and at the least possible cost. Fernandez et al. assume that the number of faulty processes or the probability of a process acting maliciously is known to the master. In this paper this assumption is removed. In the setting with n processes and n tasks we consider two different failure models, viz., model ${\mathcal F}_a$, where f-fraction, $0 a priori knowledge of the values of p and f; and model ${\mathcal F}_b$, where at most f-fraction, $0 p, $0 f and p. For model ${\mathcal F}_a$ we provide an algorithm—based on the Stopping Rule Algorithm by Dagum, Karp, Luby, and Ross [10]—that can estimate f and p with (ε,δ)-approximation, for any 0 δε0. This algorithm runs in O(logn) time, O(log2n) message complexity, and O(log2n) task-oriented work and O(nlogn) total-work complexities. We also provide a randomized algorithm for detecting the faulty processes, i.e., identifying the processes that have non-zero probability of failures in model ${\mathcal F}_a$, with task-oriented work O(n), and time O(logn). A lower bound on the total-work complexity of performing n tasks correctly with high probability is shown. Finally, two randomized algorithms to perform n tasks with high probability are given for both failure models with closely matching upper bounds on total-work and task-oriented work complexities, and time O(logn).