Adaptive ROC-based ensembles of HMMs applied to anomaly detection

  • Authors:
  • Wael Khreich;Eric Granger;Ali Miri;Robert Sabourin

  • Affiliations:
  • Laboratoire d'imagerie, de vision et d'intelligence artificielle (LIVIA), ícole de technologie supérieure, Université du Québec, 1100 Notre-Dame Ouest, Montreal, QC, Canada;Laboratoire d'imagerie, de vision et d'intelligence artificielle (LIVIA), ícole de technologie supérieure, Université du Québec, 1100 Notre-Dame Ouest, Montreal, QC, Canada;School of Computer Science, Ryerson University, Toronto, Canada;Laboratoire d'imagerie, de vision et d'intelligence artificielle (LIVIA), ícole de technologie supérieure, Université du Québec, 1100 Notre-Dame Ouest, Montreal, QC, Canada

  • Venue:
  • Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

Hidden Markov models (HMMs) have been successfully applied in many intrusion detection applications, including anomaly detection from sequences of operating system calls. In practice, anomaly detection systems (ADSs) based on HMMs typically generate false alarms because they are designed using limited amount of representative training data. Since new data may become available over time, an important feature of an ADS is the ability to accommodate newly acquired data incrementally, after it has originally been trained and deployed for operations. In this paper, a system based on the receiver operating characteristic (ROC) is proposed to efficiently adapt ensembles of HMMs (EoHMMs) in response to new data, according to a learn-and-combine approach. When a new block of training data becomes available, a pool of base HMMs is generated from the data using a different number of HMM states and random initializations. The responses from the newly trained HMMs are then combined to those of the previously trained HMMs in ROC space using a novel incremental Boolean combination (incrBC) technique. Finally, specialized algorithms for model management allow to select a diversified EoHMM from the pool, and adapt Boolean fusion functions and thresholds for improved performance, while it prunes redundant base HMMs. The proposed system is capable of changing the desired operating point during operations, and this point can be adjusted to changes in prior probabilities and costs of errors. Computer simulations conducted on synthetic and real-world host-based intrusion detection data indicate that the proposed system can achieve a significantly higher level of performance than when parameters of a single best HMM are estimated, at each learning stage, using reference batch and incremental learning techniques. It also outperforms the learn-and-combine approaches using static fusion functions (e.g., majority voting). Over time, the proposed ensemble selection algorithms form compact EoHMMs, while maintaining or improving system accuracy. Pruning allows to limit the pool size from increasing indefinitely, thereby reducing the storage space for accommodating HMMs parameters without negatively affecting the overall EoHMM performance. Although applied for HMM-based ADSs, the proposed approach is general and can be employed for a wide range of classifiers and detection applications.