Cryptographic support for fault-tolerant distributed computing

  • Authors:
  • Yaron Minsky;Robbert van Renesse;Fred B. Schneider;Scott D. Stoller

  • Affiliations:
  • Cornell University, Ithaca, New York;Cornell University, Ithaca, New York;Cornell University, Ithaca, New York;Cornell University, Ithaca, New York

  • Venue:
  • EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mobile processes, or agents, have been proposed for a variety of applications in the Internet and other large distributed systems. But little work has been directed at operating-system support for agents. This paper discusses one aspect of the problem---implementing fault-tolerance without specialized hardware.In traditional client-server settings, a central and trusted host may send all messages and receive all replies, thereby implementing a star-shaped communications pattern. In contrast, an agent can execute autonomously at a succession of remote sites without returning to the host that launched it. Thus, computations structured using agents may consume less network-bandwidth in performing tasks that involve multiple hosts. Moreover, for some settings, it is unrealistic to presume the existence of a central host that remains connected to the network---mobile computing and wireless networks are obvious examples.In an open distributed system, agents comprising an application must not only survive (possibly malicious) failures of the hosts they visit, but they must also be resilient to the potentially hostile actions of other hosts. Correctness of a computation should depend only on hosts that would be visited in a failure-free run. We assume that faulty hosts produce erroneous messages, that they can masquerade as other faulty hosts, but that they cannot assume the identities and do not have access to secrets of non-faulty hosts.Replication and voting are necessary to survive malicious behavior by visited hosts. However, faulty hosts that are not visited by agents can confound a naive replica-management scheme by spoofing. With this in mind, we have been investigating protocols for replication and voting in a family of applications. Our protocols use cryptographic techniques in novel ways. Furthermore, our experiments reveal that fast (correct) hosts can mask delays caused by slow ones, so replication actually speeds up some applications.Section 2 characterizes the family of applications treated in this paper. Section 3 describes experiments we ran to explore performance implications of replication and voting in this setting. The role of cryptographic techniques in our protocols is discussed in section 4. Section 5 contains our conclusions.