EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
Java Management Extensions
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Making Distributed Applications Manageable Through Instrumentation
PDSE '97 Proceedings of the 2nd International Workshop on Software Engineering for Parallel and Distributed Systems
Adding High Availability and Autonomic Behavior to Web Services
Proceedings of the 26th International Conference on Software Engineering
Network-Based Problem Detection for Distributed Systems
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
SysProf: Online Distributed Behavior Diagnosis through Fine-grain System Monitoring
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Dynamic instrumentation of production systems
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Co-designing the failure analysis and monitoring of large-scale systems
ACM SIGMETRICS Performance Evaluation Review
Service-Level Agreements for Electronic Services
IEEE Transactions on Software Engineering
Lightweight, high-resolution monitoring for troubleshooting production systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Self-Supervising BPEL Processes
IEEE Transactions on Software Engineering
Event driven monitoring for service composition infrastructures
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Apprehensive QoS monitoring of Service choreographies
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
Service-oriented computing has enabled developers to build large, cross-domain service compositions in a more routine manner. These systems inhabit complex, multi-tier operating environments that pose many challenges to their reliable operation. Unanticipated failures at runtime can be time-consuming to diagnose and may propagate across administrative boundaries. It has been argued that measuring readily available data about system operation can significantly increase the failure management capabilities of such systems. We have built an online monitoring system for cross-domain Web service compositions called Monere, which we use in a controlled experiment involving human operators in order to determine the effects of such an approach on diagnosis times for system-level failures. This paper gives an overview of how Monere is able to instrument relevant components across all layers of a service composition and to exploit the structure of BPEL workflows to obtain structural cross-domain dependency graphs. Our experiments reveal a reduction in diagnosis time of more than 20%. However, further analysis reveals this benefit to be dependent on certain conditions, which leads to insights about promising directions for effective support of failure diagnosis in large Web service compositions.