Outlier detection by active learning

Authors:
Naoki Abe;Bianca Zadrozny;John Langford
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;Universidade Federal Fluminense, Niterói, RJ, Brazil;Toyota Technological Institute at Chicago, Chicago, IL
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 13
Cited 26

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Learning distributions by their density levels: a paradigm for learning without a teacher

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Efficient Mining from Large Databases by Query Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Using Artificial Anomalies to Detect Unknown and Known Network Intrusions

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Parzen-Window Network Intrusion Detectors

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Diverse ensembles for active learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature bagging for outlier detection

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Local peculiarity factor and its application in outlier detection

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
One-Class Classification by Combining Density and Class Probability Estimation

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
An evaluation of dimension reduction techniques for one-class classification

Artificial Intelligence Review
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
Mining Violations to Relax Relational Database Constraints

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
ODDC: outlier detection using distance distribution clustering

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
A hybrid fraud scoring and spike detection technique in streaming data

Intelligent Data Analysis
Learning rare behaviours

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
Active learning and subspace clustering for anomaly detection

Intelligent Data Analysis
RKOF: robust kernel-based local outlier detection

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
iBAT: detecting anomalous taxi trajectories from GPS traces

Proceedings of the 13th international conference on Ubiquitous computing
Anomaly detection using ensembles

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Mining outliers with ensemble of heterogeneous detectors on random subspaces

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Isolation-Based Anomaly Detection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Fast anomaly detection for streaming data

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Stratified k-means clustering over a deep web data source

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised ensemble learning for mining top-n outliers

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Nonparametric semi-supervised learning for network intrusion detection: combining performance improvements with realistic in-situ training

Proceedings of the 5th ACM workshop on Security and artificial intelligence
A-GHSOM: An adaptive growing hierarchical self organizing map for network anomaly detection

Journal of Parallel and Distributed Computing
A learning system for discriminating variants of malicious network traffic

Proceedings of the Eighth Annual Cyber Security and Information Intelligence Research Workshop
Querying discriminative and representative samples for batch mode active learning

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Subsampling for efficient and effective unsupervised outlier detection ensembles

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
One class random forests

Pattern Recognition
One-class conditional random fields for sequential anomaly detection

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Ensembles for unsupervised outlier detection: challenges and research questions a position paper

ACM SIGKDD Explorations Newsletter
Research issues in outlier detection for data streams

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most existing approaches to outlier detection are based on density estimation methods. There are two notable issues with these methods: one is the lack of explanation for outlier flagging decisions, and the other is the relatively high computational requirement. In this paper, we present a novel approach to outlier detection based on classification, in an attempt to address both of these issues. Our approach isbased on two key ideas. First, we present a simple reduction of outlier detection to classification, via a procedure that involves applying classification to a labeled data set containing artificially generated examples that play the role of potential outliers. Once the task has been reduced to classification, we then invoke a selective sampling mechanism based on active learning to the reduced classification problem. We empirically evaluate the proposed approach using a number of data sets, and find that our method is superior to other methods based on the same reduction to classification, but using standard classification methods. We also show that it is competitive to the state-of-the-art outlier detection methods in the literature based on density estimation, while significantly improving the computational complexity and explanatory power.