NetCheck: network diagnoses from blackbox traces

  • Authors:
  • Yanyan Zhuang;Eleni Gessiou;Steven Portzer;Fraida Fund;Monzur Muhammad;Ivan Beschastnikh;Justin Cappos

  • Affiliations:
  • NYU Poly and University of British Columbia;NYU Poly;University of Washington;NYU Poly;NYU Poly;University of British Columbia;NYU Poly

  • Venue:
  • NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces NetCheck, a tool designed to diagnose network problems in large and complex applications. NetCheck relies on blackbox tracing mechanisms, such as strace, to automatically collect sequences of network system call invocations generated by the application hosts. NetCheck performs its diagnosis by (1) totally ordering the distributed set of input traces, and by (2) utilizing a network model to identify points in the totally ordered execution where the traces deviated from expected network semantics. Our evaluation demonstrates that NetCheck is able to diagnose failures in popular and complex applications without relying on any application-or network-specific information. For instance, NetCheck correctly identified the existence of NAT devices, simultaneous network disconnection/ reconnection, and platform portability issues. In a more targeted evaluation, NetCheck correctly detects over 95% of the network problems we found from bug trackers of projects like Python, Apache, and Ruby. When applied to traces of faults reproduced in a live network, NetCheck identified the primary cause of the fault in 90% of the cases. Additionally, NetCheck is efficient and can process a GB-long trace in about 2 minutes.