Brief announcement: techniques for programmatically troubleshooting distributed systems

  • Authors:
  • Sam Whitlock;Colin Scott;Scott Shenker

  • Affiliations:
  • International Computer Science Institute, Berkeley, CA, USA;University of California Berkeley, Berkeley, CA, USA;International Computer Science Institute & University of California Berkeley, Berkeley, CA, USA

  • Venue:
  • Proceedings of the 2013 ACM symposium on Principles of distributed computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The distributed systems research community has developed many provably correct algorithms and abstractions that are in wide use. However, practical implementations of distributed systems often contain many bugs, and practitioners spend much of their time troubleshooting these bugs. In this paper we present an algorithm, retrospective causal inference, to ease the burden of troubleshooting. We end by enumerating several open research problems related to the troubleshooting process.