Automated Rule-Based Diagnosis through a Distributed Monitor System
IEEE Transactions on Dependable and Secure Computing
The resiliency challenge presented by soft failure incidents
IBM Systems Journal
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
Hi-index | 0.00 |
Quantitative performance diagnosis (QPD) provides explanations that quantify the impact of problem causes. An example of such an explanation is it Increased web server traffic accounts for 90% of the increase in LAN utilization, which in turn accounts for 20% of the increase in web response times. This paper describes GAP, a general approach to quantitative performance diagnosis. GAP has two parts: (1) an algorithm for computing quantitative performance diagnoses; and (2) a framework for constructing diagnostic techniques that provides the basis for quantifications produced by the algorithm. The GAP algorithm makes use of a measurement navigation graph, a directed acyclic graph whose nodes are measurement variables and whose arcs have weights that quantify the effect of child variables (e.g., LAN utilization) on parent variables (e.g., response time). The framework for developing diagnostic techniques consists of (a) the choice of statistic (e.g., mean, variance) to aggregate problem values, and (b) the estimator of the statistic.