Automated packet trace analysis of TCP implementations
SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Web protocols and practice: HTTP/1.1, Networking protocols, caching, and traffic measurement
Web protocols and practice: HTTP/1.1, Networking protocols, caching, and traffic measurement
On the characteristics and origins of internet flow rates
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Jigsaw: solving the puzzle of enterprise 802.11 analysis
Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
Path-based faliure and evolution management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Pip: detecting the unexpected in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Answering what-if deployment and configuration questions with wise
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Detailed diagnosis in enterprise networks
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Safe and effective fine-grained TCP retransmissions for datacenter communication
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
The nature of data center traffic: measurements & analysis
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Fingerprinting the datacenter: automated classification of performance crises
Proceedings of the 5th European conference on Computer systems
Proceedings of the ACM SIGCOMM 2010 conference
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Identifying performance bottlenecks in CDNs through TCP-level monitoring
Proceedings of the first ACM SIGCOMM workshop on Measurements up the stack
Synergy2cloud: introducing cross-sharing of application experiences into the cloud management cycle
Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
X-ray: automating root-cause diagnosis of performance anomalies in production software
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Distributed time-aware provenance
Proceedings of the VLDB Endowment
Evaluating MapReduce for profiling application traffic
Proceedings of the first edition workshop on High performance and programmable networking
Adaptive monitoring: a framework to adapt passive monitoring using probing
Proceedings of the 8th International Conference on Network and Service Management
Virtual network diagnosis as a service
Proceedings of the 4th annual Symposium on Cloud Computing
Real-time diagnosis of TCP performance in clouds
Proceedings of the 2013 workshop on Student workhop
Catch the whole lot in an action: rapid precise packet loss notification in data centers
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Network performance problems are notoriously tricky to diagnose, and this is magnified when applications are often split into multiple tiers of application components spread across thousands of servers in a data center. Problems often arise in the communication between the tiers, where either the application or the network (or both!) could be to blame. In this paper, we present SNAP, a scalable network-application profiler that guides developers in identifying and fixing performance problems. SNAP passively collects TCP statistics and socket-call logs with low computation and storage overhead, and correlates across shared resources (e.g., host, link, switch) and connections to pinpoint the location of the problem (e.g., send buffer mismanagement, TCP/application conflicts, application-generated microbursts, or network congestion). Our one-week deployment of SNAP in a production data center (with over 8,000 servers and over 700 application components) has already helped developers uncover 15 major performance problems in application software, the network stack on the server, and the underlying network.