Relational database: a practical foundation for productivity
Communications of the ACM
Communications of the ACM
BloomUnit: declarative testing for distributed programs
DBTest '12 Proceedings of the Fifth International Workshop on Testing Database Systems
Untangling cluster management with Helix
Proceedings of the Third ACM Symposium on Cloud Computing
Hi-index | 0.00 |
The increasing deployment of distributed systems to solve large data and computational problems has not seen a concomitant increase in tools and techniques to test these systems. In this paper, we propose a data driven approach to testing. We translate our intuitions and expectations about how the system should behave into invariants, the truth of which can be verified from data emitted by the system. Our particular implementation of the invariants uses Q, a high-performance analytical database, programmed with a vector language. To show the practical value of this approach, we describe how it was used to test Helix, a distributed cluster manager deployed at LinkedIn. We make the case that looking at testing as an exercise in data analytics has the following benefits. It (a) increases the expressivity of the tests (b) decreases their fragility and (c) suggests additional, insightful ways to understand the system under test. As the title of the paper suggests, there is truth in the data --- we only need to look for it.