Toward online testing of federated and heterogeneous distributed systems

  • Authors:
  • Marco Canini;Vojin Jovanović;Daniele Venzano;Boris Spasojević;Olivier Crameri;Dejan Kostić

  • Affiliations:
  • School of Computer and Communication Sciences, EPFL, Switzerland;School of Computer and Communication Sciences, EPFL, Switzerland;School of Computer and Communication Sciences, EPFL, Switzerland;School of Computer and Communication Sciences, EPFL, Switzerland;School of Computer and Communication Sciences, EPFL, Switzerland;School of Computer and Communication Sciences, EPFL, Switzerland

  • Venue:
  • USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Making distributed systems reliable is notoriously difficult. It is even more difficult to achieve high reliability for federated and heterogeneous systems, i.e., those that are operated by multiple administrative entities and have numerous inter-operable implementations. A prime example of such a system is the Internet's inter-domain routing, today based on BGP. We argue that system reliability should be improved by proactively identifying potential faults using an online testing functionality. We propose DiCE, an approach that continuously and automatically explores the system behavior, to check whether the system deviates from its desired behavior. DiCE orchestrates the exploration of relevant system behaviors by subjecting system nodes to many possible inputs that exercise node actions. DiCE starts exploring from current, live system state, and operates in isolation from the deployed system. We describe our experience in integrating DiCE with an opensource BGP router. We evaluate the prototype's ability to quickly detect origin misconfiguration, a recurring operator mistake that causes Internet-wide outages. We also quantify DiCE's overhead and find it to have marginal impact on system performance.