DDOS: taming nondeterminism in distributed systems

  • Authors:
  • Nicholas Hunt;Tom Bergan;Luis Ceze;Steven D. Gribble

  • Affiliations:
  • University of Washington, Seattle, USA;University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA

  • Venue:
  • Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nondeterminism complicates the development and management of distributed systems, and arises from two main sources: the local behavior of each individual node as well as the behavior of the network connecting them. Taming nondeterminism effectively requires dealing with both sources. This paper proposes DDOS, a system that leverages prior work on deterministic multithreading to offer: 1) space-efficient record/replay of distributed systems; and 2) fully deterministic distributed behavior. Leveraging deterministic behavior at each node makes outgoing messages strictly a function of explicit inputs. This allows us to record the system by logging just message's arrival time, not the contents. Going further, we propose and implement an algorithm that makes all communication between nodes deterministic by scheduling communication onto a global logical timeline. We implement both algorithms in a system called DDOS and evaluate our system with parallel scientific applications, an HTTP/memcached system and a distributed microbenchmark with a high volume of peer-to-peer communication. Our results show up to two orders of magnitude reduction in log size of record/replay, and that distributed systems can be made deterministic with an order of magnitude of overhead.