Outlier detection by active learning

  • Authors:
  • Naoki Abe;Bianca Zadrozny;John Langford

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY;Universidade Federal Fluminense, Niterói, RJ, Brazil;Toyota Technological Institute at Chicago, Chicago, IL

  • Venue:
  • Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most existing approaches to outlier detection are based on density estimation methods. There are two notable issues with these methods: one is the lack of explanation for outlier flagging decisions, and the other is the relatively high computational requirement. In this paper, we present a novel approach to outlier detection based on classification, in an attempt to address both of these issues. Our approach isbased on two key ideas. First, we present a simple reduction of outlier detection to classification, via a procedure that involves applying classification to a labeled data set containing artificially generated examples that play the role of potential outliers. Once the task has been reduced to classification, we then invoke a selective sampling mechanism based on active learning to the reduced classification problem. We empirically evaluate the proposed approach using a number of data sets, and find that our method is superior to other methods based on the same reduction to classification, but using standard classification methods. We also show that it is competitive to the state-of-the-art outlier detection methods in the literature based on density estimation, while significantly improving the computational complexity and explanatory power.