Statistical selection of relevant subspace projections for outlier ranking

Authors:
Emmanuel Muller;Matthias Schiffer;Thomas Seidl
Affiliations:
Karlsruhe Institute of Technology (KIT), Germany;RWTH Aachen University, Germany;RWTH Aachen University, Germany
Venue:
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Year:
2011

Citing 0
Cited 7

AnyOut: anytime outlier detection on streaming data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
OutRules: a framework for outlier descriptions in multiple context spaces

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Outlier ensembles: position paper

ACM SIGKDD Explorations Newsletter
Flexible and adaptive subspace search for outlier analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Data Mining and Knowledge Discovery
Ensembles for unsupervised outlier detection: challenges and research questions a position paper

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier mining is an important data analysis task to distinguish exceptional outliers from regular objects. For outlier mining in the full data space, there are well established methods which are successful in measuring the degree of deviation for outlier ranking. However, in recent applications traditional outlier mining approaches miss outliers as they are hidden in subspace projections. Especially, outlier ranking approaches measuring deviation on all available attributes miss outliers deviating from their local neighborhood only in subsets of the attributes. In this work, we propose a novel outlier ranking based on the objects deviation in a statistically selected set of relevant subspace projections. This ensures to find objects deviating in multiple relevant subspaces, while it excludes irrelevant projections showing no clear contrast between outliers and the residual objects. Thus, we tackle the general challenges of detecting outliers hidden in subspaces of the data. We provide a selection of subspaces with high contrast and propose a novel ranking based on an adaptive degree of deviation in arbitrary subspaces. In thorough experiments on real and synthetic data we show that our approach outperforms competing outlier ranking approaches by detecting outliers in arbitrary subspace projections.