Feature bagging for outlier detection

Authors:
Aleksandar Lazarevic;Vipin Kumar
Affiliations:
University of Minnesota, East Hartford, CT;University of Minnesota, Minneapolis, MN
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 23
Cited 43

Bagging predictors

Machine Learning
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Inquirus, the NECI meta search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
BACON: blocked adaptive computationally efficient outlier nominators

Computational Statistics & Data Analysis
Robust Classification for Imprecise Environments

Machine Learning
Re-designing distance functions and distance-based applications for high dimensional data

ACM SIGMOD Record
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining needle in a haystack: classifying rare classes via two-phase rule induction

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Expert agreement and content based reranking in a meta search environment using Mearf

Proceedings of the 11th international conference on World Wide Web
Findout: finding outliers in very large datasets

Knowledge and Information Systems
Anomaly Detection over Noisy Data using Learned Probability Distributions

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Outlier Detection Using Replicator Neural Networks

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Predicting rare classes: can boosting make any weak learner strong?

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Chance Discovery

Chance Discovery
Improved Rooftop Detection in Aerial Images with Machine Learning

Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Novelty detection: a review—part 1: statistical approaches

Signal Processing
On Local Spatial Outliers

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining

Mining distance-based outliers from large databases in any metric space

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection by active learning

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection by sampling with accuracy guarantees

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Tracking multiple topics for finding interesting articles

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection in sensor networks

Proceedings of the 8th ACM international symposium on Mobile ad hoc networking and computing
Ensemble methods for anomaly detection and distributed intrusion detection in Mobile Ad-Hoc Networks

Information Fusion
Local peculiarity factor and its application in outlier detection

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier Detection with Kernel Density Functions

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Online Outlier Detection Based on Relative Neighbourhood Dissimilarity

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Outlier detection and evaluation by network flow

International Journal of Computer Applications in Technology
Learning from Skewed Class Multi-relational Databases

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Finding anomalous periodic time series

Machine Learning
A Comparative Study of Outlier Detection Algorithms

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
A Fast Feature-Based Method to Detect Unusual Patterns in Multidimensional Datasets

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
ODDC: outlier detection using distance distribution clustering

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Efficiently mining regional outliers in spatial data

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Unusual pattern detection in high dimensions

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Measuring the interestingness of articles in a limited user environment

Information Processing and Management: an International Journal
A fast randomized method for local density-based outlier detection in high dimensional data

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
On detecting clustered anomalies using SCiForest

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
SOREX: subspace outlier ranking exploration toolkit

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Detecting fraud in online games of chance and lotteries

Expert Systems with Applications: An International Journal
RKOF: robust kernel-based local outlier detection

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Anomaly detection using ensembles

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Pruned random subspace method for one-class classifiers

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
A fast greedy algorithm for outlier mining

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Mining outliers with ensemble of heterogeneous detectors on random subspaces

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Isolation-Based Anomaly Detection

ACM Transactions on Knowledge Discovery from Data (TKDD)
FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection

Data Mining and Knowledge Discovery
Anomalistic sequence detection

International Journal of Intelligent Information and Database Systems
Unsupervised ensemble learning for mining top-n outliers

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Learning from Skewed Class Multi-relational Databases

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Continuous adaptive outlier detection on distributed data streams

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
A Host-Based Intrusion Detection System Using Architectural Features to Improve Sophisticated Denial-of-Service Attack Detections

International Journal of Information Security and Privacy
Adversarial attacks against intrusion detection systems: Taxonomy, solutions and open issues

Information Sciences: an International Journal
Outlier ensembles: position paper

ACM SIGKDD Explorations Newsletter
On the combination of relative clustering validity criteria

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Subsampling for efficient and effective unsupervised outlier detection ensembles

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Flexible and adaptive subspace search for outlier analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Classification and outlier detection based on topic based pattern synthesis

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Data Mining and Knowledge Discovery
Ensembles for unsupervised outlier detection: challenges and research questions a position paper

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide non-trivial improvements over the base algorithm.