Blind source separation approach to performance diagnosis and dependency discovery
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Bayesian Methods for Practical Traitor Tracing
ACNS '07 Proceedings of the 5th international conference on Applied Cryptography and Network Security
Performance Problem Determination Using Combined Dependency Analysis for Reliable System
ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Active Diagnosis of High-Level Faults in Distributed Internet Services
APNOMS '08 Proceedings of the 11th Asia-Pacific Symposium on Network Operations and Management: Challenges for Next Generation Network Operations and Service Management
Toward autonomic grids: analyzing the job flow with affinity streaming
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-scale Real-Time Grid Monitoring with Job Stream Mining
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Adaptive traitor tracing with Bayesian networks
IAAI'07 Proceedings of the 19th national conference on Innovative applications of artificial intelligence - Volume 2
Optimal testing of structured knowledge
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
A rule-based CBR approach for expert finding and problem diagnosis
Expert Systems with Applications: An International Journal
Probabilistic fault diagnosis for IT services in noisy and dynamic environments
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
Towards an optimized model of incident ticket correlation
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
Scalable diagnosis in IP networks using path-based measurement and inference: A learning framework
Journal of Visual Communication and Image Representation
Problem localization for automated system management in ubiquitous computing
EUC'07 Proceedings of the 2007 conference on Emerging direction in embedded and ubiquitous computing
Probabilistic fault diagnosis using adaptive probing
DSOM'07 Proceedings of the Distributed systems: operations and management 18th IFIP/IEEE international conference on Managing virtualization of networks and services
Sparse signal recovery with exponential-family noise
Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
Information theoretic adaptive tracking of epidemics in complex networks
Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
Efficient active probing for fault diagnosis in large scale and noisy networks
INFOCOM'10 Proceedings of the 29th conference on Information communications
Fault diagnosis in IP networks via multicast probing: noisy measurements
Sarnoff'10 Proceedings of the 33rd IEEE conference on Sarnoff
Leveraging many simple statistical models to adaptively monitor software systems
International Journal of High Performance Computing and Networking
A probe prediction approach to overlay network monitoring
Proceedings of the 7th International Conference on Network and Services Management
Distributed Monitoring with Collaborative Prediction
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Efficient probe selection for fault localization using the property of submodularity
International Journal of Communication Systems
Efficient distributed monitoring with active Collaborative Prediction
Future Generation Computer Systems
Hi-index | 0.00 |
Real-time problem diagnosis in large distributed computer systems and networks is a challenging task that requires fast and accurate inferences from potentially huge data volumes. In this paper, we propose a cost-efficient, adaptive diagnostic technique called active probing . Probes are end-to-end test transactions that collect information about the performance of a distributed system. Active probing uses probabilistic reasoning techniques combined with information-theoretic approach, and allows a fast online inference about the current system state via active selection of only a small number of most-informative tests. We demonstrate empirically that the active probing scheme greatly reduces both the number of probes (from 60% to 75% in most of our real-life applications), and the time needed for localizing the problem when compared with nonadaptive (preplanned) probing schemes. We also provide some theoretical results on the complexity of probe selection, and the effect of "noisy" probes on the accuracy of diagnosis. Finally, we discuss how to model the system's dynamics using dynamic Bayesian networks (DBNs), and an efficient approximate approach called sequential multifault; empirical results demonstrate clear advantage of such approaches over "static" techniques that do not handle system's changes.