Iterative Boolean combination of classifiers in the ROC space: An application to anomaly detection with HMMs

  • Authors:
  • Wael Khreich;Eric Granger;Ali Miri;Robert Sabourin

  • Affiliations:
  • Laboratoire d'imagerie, de vision et d'intelligence artificielle (LIVIA), ícole de technologie supérieure, Université du Québec, 1100 Notre-Dame Ouest, Montreal, QC, Canada;Laboratoire d'imagerie, de vision et d'intelligence artificielle (LIVIA), ícole de technologie supérieure, Université du Québec, 1100 Notre-Dame Ouest, Montreal, QC, Canada;School of Information Technology and Engineering (SITE), University of Ottawa, 161 Louis Pasteur, Ottawa, ON, Canada;Laboratoire d'imagerie, de vision et d'intelligence artificielle (LIVIA), ícole de technologie supérieure, Université du Québec, 1100 Notre-Dame Ouest, Montreal, QC, Canada

  • Venue:
  • Pattern Recognition
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Hidden Markov models (HMMs) have been shown to provide a high level performance for detecting anomalies in sequences of system calls to the operating system kernel. Using Boolean conjunction and disjunction functions to combine the responses of multiple HMMs in the ROC space may significantly improve performance over a ''single best'' HMM. However, these techniques assume that the classifiers are conditional independent, and their of ROC curves are convex. These assumptions are violated in most real-world applications, especially when classifiers are designed using limited and imbalanced training data. In this paper, the iterative Boolean combination (IBC) technique is proposed for efficient fusion of the responses from multiple classifiers in the ROC space. It applies all Boolean functions to combine the ROC curves corresponding to multiple classifiers, requires no prior assumptions, and its time complexity is linear with the number of classifiers. The results of computer simulations conducted on both synthetic and real-world host-based intrusion detection data indicate that the IBC of responses from multiple HMMs can achieve a significantly higher level of performance than the Boolean conjunction and disjunction combinations, especially when training data are limited and imbalanced. The proposed IBC is general in that it can be employed to combine diverse responses of any crisp or soft one- or two-class classifiers, and for wide range of application domains.