LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data bubbles: quality preserving performance boosting for hierarchical clustering
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Outlier Detection in High Dimensional Spaces
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Ensembles of Learning Machines
WIRN VIETRI 2002 Proceedings of the 13th Italian Workshop on Neural Nets-Revised Papers
Ensemble Methods in Machine Learning
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets
IEEE Transactions on Knowledge and Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature bagging for outlier detection
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Clustering Ensembles: Models of Consensus and Weak Partitions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Outlier detection by active learning
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Converting Output Scores from Outlier Detection Algorithms into Probability Estimates
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Moderate diversity for better cluster ensembles
Information Fusion
Angle-based outlier detection in high-dimensional data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Local peculiarity factor and its application in outlier detection
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets
ACM Transactions on Knowledge Discovery from Data (TKDD)
A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Efficient Pruning Schemes for Distance-Based Outlier Detection
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
LoOP: local outlier probabilities
Proceedings of the 18th ACM conference on Information and knowledge management
Distance-based outlier detection: consolidation and renewed bearing
Proceedings of the VLDB Endowment
Ranking outliers using symmetric neighborhood relationship
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Ensembles based on random projections to improve the accuracy of clustering algorithms
WIRN'05 Proceedings of the 16th Italian conference on Neural Nets
Mining outliers with ensemble of heterogeneous detectors on random subspaces
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Evaluation of Clusterings -- Metrics and Visual Support
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Interactive data mining with 3D-parallel-coordinate-trees
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Ensembles for unsupervised outlier detection: challenges and research questions a position paper
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
Outlier detection and ensemble learning are well established research directions in data mining yet the application of ensemble techniques to outlier detection has been rarely studied. Here, we propose and study subsampling as a technique to induce diversity among individual outlier detectors. We show analytically and experimentally that an outlier detector based on a subsample per se, besides inducing diversity, can, under certain conditions, already improve upon the results of the same outlier detector on the complete dataset. Building an ensemble on top of several subsamples is further improving the results. While in the literature so far the intuition that ensembles improve over single outlier detectors has just been transferred from the classification literature, here we also justify analytically why ensembles are also expected to work in the unsupervised area of outlier detection. As a side effect, running an ensemble of several outlier detectors on subsamples of the dataset is more efficient than ensembles based on other means of introducing diversity and, depending on the sample rate and the size of the ensemble, can be even more efficient than just the single outlier detector on the complete data.