Pinpointing the Subsystems Responsible for the Performance Deviations in a Load Test

Authors:
Haroon Malik;Bram Adams;Ahmed E. Hassan
Affiliations:
-;-;-
Venue:
ISSRE '10 Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering
Year:
2010

Citing 0
Cited 2

Automatic detection of performance deviations in the load testing of large scale systems

Proceedings of the 2013 International Conference on Software Engineering
An online service-oriented performance profiling tool for cloud computing systems

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large scale systems (LSS) contain multiple subsystems that interact across multiple nodes in sometimes unforeseen and complicated ways. As a result, pinpointing the subsystems that are the source of performance degradation for a load test in LSS can be frustrating, and might take several hours or even days. This is due to the large volume of performance counter data collected such as CPU utilization, Disk I/O, memory consumption and network traffic, the limited operational knowledge of analysts about all subsystems of an LSS and the unavailability of up-to-date documentation in a LSS. We have developed a methodology that automatically ranks the subsystems according to the deviation of their performance in a load test. Our methodology uses performance counter data of a load test to craft performance signatures for the LSS subsystems. Pair-wise correlations among the performance signatures of subsystems within a load test are compared with the corresponding correlations in a baseline test to pinpoint the subsystems responsible for the performance violations. Case studies on load test data obtained from a large telecom system and that of an open source benchmark application show that our approach provides an accuracy of 79% and do not require any instrumentation or domain knowledge to operate.