Magnifier: Online Detection of Performance Problems in Large-Scale Cloud Computing Systems

Authors:
Haibo Mi;Huaimin Wang;Gang Yin;Hua Cai;Qi Zhou;Tingtao Sun;Yangfan Zhou
Affiliations:
-;-;-;-;-;-;-
Venue:
SCC '11 Proceedings of the 2011 IEEE International Conference on Services Computing
Year:
2011

Citing 0
Cited 1

Modeling and performance analysis of large scale IaaS Clouds

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In large-scale cloud computing systems, even a simple user request may go through numerous of services that are deployed on different physical machines. As a result, it is a great challenge to online localize the prime causes of performance degradation in such systems. Existing end-to-end request tracing approaches are not suitable for online anomaly detection because their time complexity is exponential in the size of the trace logs. In this paper, we propose an approach, namely Magnifier, to rapidly diagnose the source of performance degradation in large-scale non-stop cloud systems. In Magnifier, the execution path graph of a user request is modeled by a hierarchical structure including component layer, module layer and function layer, and anomalies are detected from higher layer to lower layer separately. In each layer every node is assigned a newly created identifier in addition to the global identifier of the request, which significantly decreases the size of parsing trace logs and accelerates the anomaly detection process. We conduct extensive experiments over a real-world enterprise system (the Alibaba cloud computing platform) providing services for the public. The results show that Magnifier can locate the prime causes of performance degradation more accurately and efficiently.