A comparison of mechanisms for improving TCP performance over wireless links
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Internet indirection infrastructure
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Toolkit for User-Level File Systems
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Stateful distributed interposition
ACM Transactions on Computer Systems (TOCS)
OpenDHT: a public DHT service and its uses
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Experiences with a continuous network tracing infrastructure
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pip: detecting the unexpected in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Causeway: support for controlling and analyzing the execution of multi-tier applications
Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
A case for end system multicast
IEEE Journal on Selected Areas in Communications
BorderPatrol: isolating events for black-box tracing
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Towards an I/O tracing framework taxonomy
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
NetComplex: a complexity metric for networked system designs
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
D3S: debugging deployed distributed systems
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Dynamic detection of event handlers
WODA '08 Proceedings of the 2008 international workshop on dynamic analysis: held in conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2008)
Efficient on-demand operations in dynamic distributed infrastructures
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Using realistic simulation for performance analysis of mapreduce setups
Proceedings of the 1st ACM workshop on Large-Scale system and application performance
Detecting large-scale system problems by mining console logs
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Macroscope: end-point approach to networked application dependency discovery
Proceedings of the 5th international conference on Emerging networking experiments and technologies
Ganesha: blackBox diagnosis of MapReduce systems
ACM SIGMETRICS Performance Evaluation Review
NetReplay: a new network primitive
ACM SIGMETRICS Performance Evaluation Review
Predicting and preventing inconsistencies in deployed distributed systems
ACM Transactions on Computer Systems (TOCS)
Decoupling storage and computation in Hadoop with SuperDataNodes
ACM SIGOPS Operating Systems Review
MR-scope: a real-time tracing tool for MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Mochi: visual log-analysis based tools for debugging hadoop
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Towards automatic inference of task hierarchies in complex systems
HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
Volley: automated data placement for geo-distributed cloud services
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Automating network application dependency discovery: experiences, limitations, and new solutions
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Quanto: tracking energy in networked embedded systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Layering in provenance systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Mining console logs for large-scale system problem detection
SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Experiences with tracing causality in networked services
INM/WREN'10 Proceedings of the 2010 internet network management conference on Research on enterprise networking
Using strongly typed networking to architect for tussle
Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
Finding latent performance bugs in systems implementations
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Focus replay debugging effort on the control plane
HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
Chukwa: a system for reliable large-scale log collection
LISA'10 Proceedings of the 24th international conference on Large installation system administration
Synoptic: summarizing system logs with refinement
SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
Experience mining Google's production console logs
SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
MT-WAVE: profiling multi-tier web applications
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Diagnosing performance changes by comparing request flows
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Profiling network performance for multi-tier data center applications
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Friday: global comprehension for distributed replay
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
REME-D: a reflective epidemic message-oriented debugger for ambient-oriented applications
Proceedings of the 2011 ACM Symposium on Applied Computing
ASDF: an automated, online framework for diagnosing performance problems
Architecting dependable systems VII
Rake: semantics assisted network-based tracing framework
Proceedings of the Nineteenth International Workshop on Quality of Service
Otus: resource attribution in data-intensive clusters
Proceedings of the second international workshop on MapReduce and its applications
HiTune: dataflow-based performance analysis for big data cloud
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
G2: a graph processing system for diagnosing distributed systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
OFRewind: enabling record and replay troubleshooting for networks
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Identifying performance bottlenecks in CDNs through TCP-level monitoring
Proceedings of the first ACM SIGCOMM workshop on Measurements up the stack
Understanding and improving the diagnostic workflow of MapReduce users
CHIMIT '11 Proceedings of the 5th ACM Symposium on Computer Human Interaction for Management of Information Technology
Advances and challenges in log analysis
Communications of the ACM
Advances and Challenges in Log Analysis
Queue - Log Analysis
The datacenter needs an operating system
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Modeling the parallel execution of black-box services
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
HiTune: dataflow-based performance analysis for big data cloud
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Structured comparative analysis of systems logs to diagnose performance problems
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Automated diagnosis without predictability is a recipe for failure
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Towards automated collection of application-level data provenance
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
AppInsight: mobile app performance monitoring in the wild
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
X-ray: automating root-cause diagnosis of performance anomalies in production software
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Dealer: application-aware request splitting for interactive cloud applications
Proceedings of the 8th international conference on Emerging networking experiments and technologies
P4-simsaas: policy specification for Multi-Tendency simulation software-as-a-service model
Proceedings of the Winter Simulation Conference
Theia: visual signatures for problem diagnosis in large hadoop clusters
lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
Power containers: an OS facility for fine-grained power and energy management on multicore servers
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A characteristic study on failures of production distributed data-parallel programs
Proceedings of the 2013 International Conference on Software Engineering
Understanding latency variations of black box services
Proceedings of the 22nd international conference on World Wide Web
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Timecard: controlling user-perceived delays in server-based mobile applications
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Virtual network diagnosis as a service
Proceedings of the 4th annual Symposium on Cloud Computing
On fault resilience of OpenStack
Proceedings of the 4th annual Symposium on Cloud Computing
Verifiable network function outsourcing: requirements, challenges, and roadmap
Proceedings of the 2013 workshop on Hot topics in middleboxes and network function virtualization
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Distributed debugging for mobile networks
Journal of Systems and Software
I know what your packet did last hop: using packet histories to troubleshoot networks
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
NetCheck: network diagnoses from blackbox traces
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Enforcing network-wide policies in the presence of dynamic middlebox actions using flowtags
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.02 |
Modern Internet systems often combine different applications (e.g., DNS, web, and database), span different administrative domains, and function in the context of network mechanisms like tunnels, VPNs, NATs, and overlays. Diagnosing these complex systems is a daunting challenge. Although many diagnostic tools exist, they are typically designed for a specific layer (e.g., traceroute) or application, and there is currently no tool for reconstructing a comprehensive view of service behavior. In this paper we propose X-Trace, a tracing framework that provides such a comprehensive view for systems that adopt it. We have implemented X-Trace in several protocols and software systems, and we discuss how it works in three deployed scenarios: DNS resolution, a three-tiered photo-hosting website, and a service accessed through an overlay network.