dBug: systematic evaluation of distributed systems

  • Authors:
  • Jiri Simsa;Randy Bryant;Garth Gibson

  • Affiliations:
  • Computer Science Department, Carnegie Mellon University;Computer Science Department, Carnegie Mellon University;Computer Science Department, Carnegie Mellon University

  • Venue:
  • SSV'10 Proceedings of the 5th international conference on Systems software verification
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the design, implementation and evaluation of "dBug" - a tool that leverages manual instrumentation for systematic evaluation of distributed and concurrent systems. Specifically, for a given distributed concurrent system, its initial state and a workload, the dBug tool systematically explores possible orders in which concurrent events triggered by the workload can happen. Further, dBug optionally uses the partial order reduction mechanism to avoid exploration of equivalent orders. Provided with a correctness check, the dBug tool is able to verify that all possible serializations of a given concurrent workload execute correctly. Upon encountering an error, the tool produces a trace that can be replayed to investigate the error. We applied the dBug tool to two distributed systems - the Parallel Virtual File System (PVFS) implemented in C and the FAWN-based key-value storage (FAWN-KV) implemented in C++. In particular, we integrated both systems with dBug to expose the non-determinism due to concurrency. This mechanism was used to verify that the result of concurrent execution of a number of basic operations from a fixed initial state meets the high-level specification of PVFS and FAWN-KV. The experimental evidence shows that the dBug tool is capable of systematically exploring behaviors of a distributed system in a modular, practical, and effective manner.