Surviving internet catastrophes

  • Authors:
  • Flavio Junqueira;Ranjita Bhagwan;Alejandro Hevia;Keith Marzullo;Geoffrey M. Voelker

  • Affiliations:
  • Department of Computer Science and Engineering, University of California, San Diego;Department of Computer Science and Engineering, University of California, San Diego;Department of Computer Science and Engineering, University of California, San Diego;Department of Computer Science and Engineering, University of California, San Diego;Department of Computer Science and Engineering, University of California, San Diego

  • Venue:
  • ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a new approach for designing distributed systems to survive Internet catastrophes called informed replication, and demonstrate this approach with the design and evaluation of a cooperative backup system called the Phoenix Recovery Service. Informed replication uses a model of correlated failures to exploit software diversity. The key observation that makes our approach both feasible and practical is that Internet catastrophes result from shared vulnerabilities. By replicating a system service on hosts that do not have the same vulnerabilities, an Internet pathogen that exploits a vulnerability is unlikely to cause all replicas to fail. To characterize software diversity in an Internet setting, we measure the software diversity of host operating systems and network services in a large organization. We then use insights from our measurement study to develop and evaluate heuristics for computing replica sets that have a number of attractive features. Our heuristics provide excellent reliability guarantees, result in low degree of replication, limit the storage burden on each host in the system, and lend themselves to a fully distributed implementation. We then present the design and prototype implementation of Phoenix, and evaluate it on the PlanetLab testbed.