Fault Management in Distributed Systems: A Policy-Driven Approach
Journal of Network and Systems Management
Fault Management in Distributed Systems: A Policy-Driven Approach
Journal of Network and Systems Management
Monere: monitoring of service compositions for failure diagnosis
ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
Hi-index | 0.00 |
The goal of a management system in a distributed computing environment is to provide a centralized and coordinated view of an otherwise distributed and heterogeneous collection of hardware and software resources. Management systems monitor, analyze and control network resources, system resources, and distributed application programs. Many organizations currently depend on mission-critical distributed applications, a trend that will increase as software engineering tools emerge that make it easier to construct distributed applications. We believe that manageability must be built in to distributed applications from the beginning rather than added in an ad hoc fashion after they have been developed. Just as designing software for usability, testability and maintenance are being addressed in the development process, so must designing for manageability. Application manageability is a research issue of particular interest to us. The work described in this paper focuses on instrumenting processes to allow them to respond to management requests, generate management reports, and maintain information required by the management system. We present an instrumentation architecture to support this, a prototype implementation which includes a class library of standard instrumentation, and a methodology for instrumentation.