Interposition agents: transparently interposing user code at the system interface
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
On calibrating measurements of packet transit times
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Probability and statistics with reliability, queuing and computer science applications
Probability and statistics with reliability, queuing and computer science applications
SEDA: an architecture for well-conditioned, scalable internet services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
DPM: A Measurement System for Distributed Programs
IEEE Transactions on Computers
The NetLogger Methodology for High Performance Distributed Systems Performance Analysis
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Scalability and accuracy in a large-scale network emulator
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Reliability and security in the CoDeeN content distribution network
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Designing a DHT for low latency and high throughput
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Democratizing content publication with coral
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Operating system support for planetary-scale network services
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Path-based faliure and evolution management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pip: detecting the unexpected in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
SC2D: an alternative to trace anonymization
Proceedings of the 2006 SIGCOMM workshop on Mining network data
Whodunit: transactional profiling for multi-tier applications
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
BorderPatrol: isolating events for black-box tracing
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Dynamic dependencies and performance improvement
LISA'08 Proceedings of the 22nd conference on Large installation system administration conference
Change is hard: adapting dependency graph models for unified diagnosis in wired/wireless networks
Proceedings of the 1st ACM workshop on Research on enterprise networking
Troubleshooting thousands of jobs on production grids using data mining techniques
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
How to keep your head above water while detecting errors
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Macroscope: end-point approach to networked application dependency discovery
Proceedings of the 5th international conference on Emerging networking experiments and technologies
A query language for understanding component interactions in production systems
Proceedings of the 24th ACM International Conference on Supercomputing
How to keep your head above water while detecting errors
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
FlowRank: ranking NetFlow records
Proceedings of the 6th International Wireless Communications and Mobile Computing Conference
Automating network application dependency discovery: experiences, limitations, and new solutions
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Look who's talking: discovering dependencies between virtual machines using CPU utilization
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Mining netflow records for critical network activities
AIMS'10 Proceedings of the Mechanisms for autonomous management of networks and services, and 4th international conference on Autonomous infrastructure, management and security
Diagnosing performance changes by comparing request flows
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Rake: semantics assisted network-based tracing framework
Proceedings of the Nineteenth International Workshop on Quality of Service
OFRewind: enabling record and replay troubleshooting for networks
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
BotTrack: tracking botnets using NetFlow and PageRank
NETWORKING'11 Proceedings of the 10th international IFIP TC 6 conference on Networking - Volume Part I
Using link gradients to predict the impact of network latency on multitier applications
IEEE/ACM Transactions on Networking (TON)
NetPilot: automating datacenter network failure mitigation
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Net-cohort: detecting and managing VM ensembles in virtualized data centers
Proceedings of the 9th international conference on Autonomic computing
Proceedings of the 9th international conference on Autonomic computing
NetPilot: automating datacenter network failure mitigation
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
An online service-oriented performance profiling tool for cloud computing systems
Frontiers of Computer Science: Selected Publications from Chinese Universities
Carat: collaborative energy diagnosis for mobile devices
Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems
On fault resilience of OpenStack
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.00 |
Wide-area distributed applications are challenging to debug, optimize, and maintain. We present Wide-Area Project 5 (WAP5), which aims to make these tasks easier by exposing the causal structure of communication within an application and by exposing delays that imply bottlenecks. These bottlenecks might not otherwise be obvious, with or without the application's source code. Previous research projects have presented algorithms to reconstruct application structure and the corresponding timing information from black-box message traces of local-area systems. In this paper we present (1) a new algorithm for reconstructing application structure in both local- and wide-area distributed systems, (2) an infrastructure for gathering application traces in PlanetLab, and (3) our experiences tracing and analyzing three systems: CoDeeN and Coral, two content-distribution networks in PlanetLab; and Slurpee, an enterprise-scale incident-monitoring system.