In data veritas: data driven testing for distributed systems

  • Authors:
  • Ramesh Subramonian;Kishore Gopalakrishna;Kapil Surlaker;Bob Schulman;Mihir Gandhi;Sajid Topiwala;David Zhang;Zhen Zhang

  • Affiliations:
  • LinkedIn;LinkedIn;LinkedIn;LinkedIn;LinkedIn;LinkedIn;LinkedIn;LinkedIn

  • Venue:
  • Proceedings of the Sixth International Workshop on Testing Database Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The increasing deployment of distributed systems to solve large data and computational problems has not seen a concomitant increase in tools and techniques to test these systems. In this paper, we propose a data driven approach to testing. We translate our intuitions and expectations about how the system should behave into invariants, the truth of which can be verified from data emitted by the system. Our particular implementation of the invariants uses Q, a high-performance analytical database, programmed with a vector language. To show the practical value of this approach, we describe how it was used to test Helix, a distributed cluster manager deployed at LinkedIn. We make the case that looking at testing as an exercise in data analytics has the following benefits. It (a) increases the expressivity of the tests (b) decreases their fragility and (c) suggests additional, insightful ways to understand the system under test. As the title of the paper suggests, there is truth in the data --- we only need to look for it.