Incremental estimation of discrete hidden Markov models based on a new backward procedure
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Monitoring MPI programs for performance characterization and management control
Proceedings of the 2010 ACM Symposium on Applied Computing
Proactive process-level live migration and back migration in HPC environments
Journal of Parallel and Distributed Computing
Efficient modeling of discrete events for anomaly detection using hidden markov models
ISC'05 Proceedings of the 8th international conference on Information Security
Multiclass classification of distributed memory parallel computations
Pattern Recognition Letters
Hi-index | 0.00 |
Current technologies allow efficient data collection by several sensors to determine an overall evaluation of the status of a cluster. However, no previous work of which we are aware analyzes the behavior of the parallel programs themselves in real time. In this paper, we perform a comparison of different artificial intelligence techniques that can be used to implement a lightweight monitoring and analysis system for parallel applications on a cluster of Linux workstations. We study the accuracy and performance of deterministic and stochastic algorithms when we observe the flow of both library-function and operating-system calls of parallel programs written with C and MPI. We demonstrate that monitoring of MPI programs can be achieved with high accuracy and in some cases with a false-positive rate near 0% in real time, and we show that the added computational load on each node is small. As an example, the monitoring of function calls using a hidden Markov model generates less than 5% overhead. The proposed system is able to automatically detect deviations of a process from its expected behavior in any node of the cluster, and thus it can be used as an anomaly detector, for performance monitoring to complement other systems or as a debugging tool. Copyright © 2005 John Wiley & Sons, Ltd.