NetCheck: network diagnoses from blackbox traces

Authors:
Yanyan Zhuang;Eleni Gessiou;Steven Portzer;Fraida Fund;Monzur Muhammad;Ivan Beschastnikh;Justin Cappos
Affiliations:
NYU Poly and University of British Columbia;NYU Poly;University of Washington;NYU Poly;NYU Poly;University of British Columbia;NYU Poly
Venue:
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Year:
2014

Citing 27
Cited 0

Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Using Model Checking to Analyze Network Vulnerabilities

SP '00 Proceedings of the 2000 IEEE Symposium on Security and Privacy
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Multi-resolution Abnormal Trace Detection Using Varied-length N-grams and Automata

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Rigorous specification and conformance testing techniques for network protocols, as applied to TCP, UDP, and sockets

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Quantifying Skype user satisfaction

Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
Model checking large network protocol implementations

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Detecting BGP configuration faults with static analysis

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Correlating instrumentation data to system states: a building block for automated diagnosis and control

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pip: detecting the unexpected in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Automated Rule-Based Diagnosis through a Distributed Monitor System

IEEE Transactions on Dependable and Secure Computing
Rule-based static analysis of network protocol implementations

Information and Computation
BorderPatrol: isolating events for black-box tracing

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
D3S: debugging deployed distributed systems

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Live Debugging of Distributed Systems

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
NetPrints: diagnosing home network misconfigurations using shared knowledge

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Detailed diagnosis in enterprise networks

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Detecting large-scale system problems by mining console logs

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
SherLog: error diagnosis by connecting clues from run-time logs

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Mining invariants from console logs for system problem detection

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Diagnosing performance changes by comparing request flows

Proceedings of the 8th USENIX conference on Networked systems design and implementation
X-trace: a pervasive network tracing framework

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Leveraging existing instrumentation to automatically infer invariant-constrained models

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces NetCheck, a tool designed to diagnose network problems in large and complex applications. NetCheck relies on blackbox tracing mechanisms, such as strace, to automatically collect sequences of network system call invocations generated by the application hosts. NetCheck performs its diagnosis by (1) totally ordering the distributed set of input traces, and by (2) utilizing a network model to identify points in the totally ordered execution where the traces deviated from expected network semantics. Our evaluation demonstrates that NetCheck is able to diagnose failures in popular and complex applications without relying on any application-or network-specific information. For instance, NetCheck correctly identified the existence of NAT devices, simultaneous network disconnection/ reconnection, and platform portability issues. In a more targeted evaluation, NetCheck correctly detects over 95% of the network problems we found from bug trackers of projects like Python, Apache, and Ruby. When applied to traces of faults reproduced in a live network, NetCheck identified the primary cause of the fault in 90% of the cases. Additionally, NetCheck is efficient and can process a GB-long trace in about 2 minutes.