A survey of online failure prediction methods
ACM Computing Surveys (CSUR)
Online failure prediction in cloud datacenters by real-time message pattern learning
CLOUDCOM '12 Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom)
Hi-index | 0.00 |
The size and complexity of cloud environments make them prone to failures. The traditional approach to achieve a high dependability for these systems relies on constant monitoring. However, this method is purely reactive. A more proactive approach is provided by online failure prediction (OFP) techniques. In this paper, we describe a OFP system for private IaaS platforms, currently under development, that combines different types of data input, including monitoring information, event logs, and failure data. In addition, this system operates at both the physical and virtual planes of the cloud, taking into account the relationships between nodes and failure propagation mechanisms that are unique to cloud environments.