Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
User-level internet path diagnosis
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Improving accuracy in end-to-end packet loss measurement
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Shrink: a tool for failure diagnosis in IP networks
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
WAP5: black-box performance debugging for wide-area systems
Proceedings of the 15th international conference on World Wide Web
A first look at modern enterprise traffic
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Path-based faliure and evolution management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
IP fault localization via risk modeling
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
FUSE: lightweight guaranteed distributed failure notification
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Loopy belief propagation for approximate inference: an empirical study
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Analysis of communities of interest in data networks
PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
High speed and robust event correlation
IEEE Communications Magazine
NetComplex: a complexity metric for networked system designs
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
What's going on?: learning communication rules in edge networks
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Answering what-if deployment and configuration questions with wise
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
How healthy are today's enterprise networks?
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Passive diagnosis for wireless sensor networks
Proceedings of the 6th ACM conference on Embedded network sensor systems
Evolution of storage management: transforming raw data into information
IBM Journal of Research and Development
ENAVis: enterprise network activities visualization
LISA'08 Proceedings of the 22nd conference on Large installation system administration conference
Understanding customer problem troubleshooting from storage system logs
FAST '09 Proccedings of the 7th conference on File and storage technologies
Efficient on-demand operations in dynamic distributed infrastructures
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
A First Look at Media Conferencing Traffic in the Global Enterprise
PAM '09 Proceedings of the 10th International Conference on Passive and Active Network Measurement
Troubleshooting chronic conditions in large IP networks
CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
NetPrints: diagnosing home network misconfigurations using shared knowledge
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Towards automated performance diagnosis in a large IPTV network
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Detailed diagnosis in enterprise networks
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Change is hard: adapting dependency graph models for unified diagnosis in wired/wireless networks
Proceedings of the 1st ACM workshop on Research on enterprise networking
ANTIDOTE: understanding and defending against poisoning of anomaly detectors
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Characterizing VLAN-induced sharing in a campus network
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Macroscope: end-point approach to networked application dependency discovery
Proceedings of the 5th international conference on Emerging networking experiments and technologies
EbAT: online methods for detecting utility cloud anomalies
Proceedings of the 6th Middleware Doctoral Symposium
Unveiling the underlying relationships over a network for monitoring purposes
International Journal of Network Management
Performance debugging in data centers: doing more with less
COMSNETS'09 Proceedings of the First international conference on COMmunication Systems And NETworks
Mining dependency in distributed systems through unstructured logs analysis
ACM SIGOPS Operating Systems Review
Are clouds ready for large distributed applications?
ACM SIGOPS Operating Systems Review
The cubicle vs. the coffee shop: behavioral modes in enterprise end-users
PAM'08 Proceedings of the 9th international conference on Passive and active network measurement
A query language for understanding component interactions in production systems
Proceedings of the 24th ACM International Conference on Supercomputing
Overseer: A Mobile Context-Aware Collaboration and Task Management System for Disaster Response
C5 '10 Proceedings of the 2010 Eighth International Conference on Creating, Connecting and Collaborating through Computing
Automated debugging of SLO violations in enterprise systems
COMSNETS'10 Proceedings of the 2nd international conference on COMmunication systems and NETworks
Webprofiler: cooperative diagnosis of web failures
COMSNETS'10 Proceedings of the 2nd international conference on COMmunication systems and NETworks
Measurement and diagnosis of address misconfigured P2P traffic
INFOCOM'10 Proceedings of the 29th conference on Information communications
Detecting the performance impact of upgrades in large operational networks
Proceedings of the ACM SIGCOMM 2010 conference
Towards automatic inference of task hierarchies in complex systems
HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
WebProphet: automating performance prediction for web services
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Automating network application dependency discovery: experiences, limitations, and new solutions
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Look who's talking: discovering dependencies between virtual machines using CPU utilization
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Experiences with tracing causality in networked services
INM/WREN'10 Proceedings of the 2010 internet network management conference on Research on enterprise networking
SecureAngle: improving wireless security using angle-of-arrival information
Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
Managing networks through context: Graph visualization and exploration
Computer Networks: The International Journal of Computer and Telecommunications Networking
Self-diagnostic peer-assisted video streaming through a learning framework
Proceedings of the international conference on Multimedia
G-RCA: a generic root cause analysis platform for service quality management in large IP networks
Proceedings of the 6th International COnference
Proceedings of the 6th International COnference
Translation of service level agreements: a generic problem definition
ICSOC/ServiceWave'09 Proceedings of the 2009 international conference on Service-oriented computing
MT-WAVE: profiling multi-tier web applications
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Passive diagnosis for wireless sensor networks
IEEE/ACM Transactions on Networking (TON)
QoSaaS: quality of service as a service
Hot-ICE'11 Proceedings of the 11th USENIX conference on Hot topics in management of internet, cloud, and enterprise networks and services
Profiling network performance for multi-tier data center applications
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Rake: semantics assisted network-based tracing framework
Proceedings of the Nineteenth International Workshop on Quality of Service
A flexible architecture integrating monitoring and analytics for managing large-scale data centers
Proceedings of the 8th ACM international conference on Autonomic computing
OFRewind: enabling record and replay troubleshooting for networks
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Machine learning approach for IP-flow record anomaly detection
NETWORKING'11 Proceedings of the 10th international IFIP TC 6 conference on Networking - Volume Part I
Analyzing IPTV set-top box crashes
Proceedings of the 2nd ACM SIGCOMM workshop on Home networks
Large-scale app-based reporting of customer problems in cellular networks: potential and limitations
Proceedings of the first ACM SIGCOMM workshop on Measurements up the stack
Performance of networked applications: the challenges in capturing the user's perception
Proceedings of the first ACM SIGCOMM workshop on Measurements up the stack
dFault: fault localization in large-scale peer-to-peer systems
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Practical experiences with chronics discovery in large telecommunications systems
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Scalable analysis of attack scenarios
ESORICS'11 Proceedings of the 16th European conference on Research in computer security
Using link gradients to predict the impact of network latency on multitier applications
IEEE/ACM Transactions on Networking (TON)
Q-score: proactive service quality assessment in a large IPTV system
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Session management of correlated multi-stream 3D tele-immersive environments
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Rapid detection of maintenance induced changes in service performance
Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies
Practical experiences with chronics discovery in large telecommunications systems
ACM SIGOPS Operating Systems Review
End-user perspectives of Internet connectivity problems
Computer Networks: The International Journal of Computer and Telecommunications Networking
Large-Scale inference of network-service disruption upon natural disasters
Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
Provenance for system troubleshooting
LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
DAPA: diagnosing application performance anomalies for virtualized infrastructures
Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
Application dependency discovery using matrix factorization
Proceedings of the 2012 IEEE 20th International Workshop on Quality of Service
NetPilot: automating datacenter network failure mitigation
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Automated diagnosis without predictability is a recipe for failure
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
NetPilot: automating datacenter network failure mitigation
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
An approach for failure recognition in IP-based industrial control networks and systems
International Journal of Network Management
Automated home network troubleshooting with device collaboration
Proceedings of the 2012 ACM conference on CoNEXT student workshop
G-RCA: a generic root cause analysis platform for service quality management in large IP networks
IEEE/ACM Transactions on Networking (TON)
On the accurate identification of network service dependencies in distributed systems
lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
Analytical modeling for what-if analysis in complex cloud computing applications
ACM SIGMETRICS Performance Evaluation Review
A provider-side view of web search response time
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Answering: techniques and deployment experience
IEEE/ACM Transactions on Networking (TON)
An untold story of redundant clouds: making your service deployment truly reliable
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Adtributor: revenue debugging in advertising systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multi-level, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, accurate problem localization, we introduce an Inference Graph model, which is well-adapted to user-perceptible problems rooted in conditions giving rise to both partial service degradation and hard faults. Further, we introduce the Sherlock system to discover Inference Graphs in the operational enterprise, infer critical attributes, and then leverage the result to automatically detect and localize problems. To illuminate strengths and limitations of the approach, we provide results from a prototype deployment in a large enterprise network, as well as from testbed emulations and simulations. In particular, we find that taking into account multi-level structure leads to a 30% improvement in fault localization, as compared to two-level approaches.