LSH based outlier detection and its application in distributed setting
Proceedings of the 20th ACM international conference on Information and knowledge management
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Flexible and adaptive subspace search for outlier analysis
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Outlier detection is fundamental to a variety of database and analytic tasks. Recently, distance-based outlier detection has emerged as a viable and scalable alternative to traditional statistical and geometric approaches. In this article we explore the role of ranking for the efficient discovery of distance-based outliers from large high dimensional data sets. Specifically, we develop a light-weight ranking scheme that is powered by locality sensitive hashing, which reorders the database points according to their likelihood of being an outlier. We provide theoretical arguments to justify the rationale for the approach and subsequently conduct an extensive empirical study highlighting the effectiveness of our approach over extant solutions. We show that our ranking scheme improves the efficiency of the distance-based outlier discovery process by up to 5-fold. Furthermore, we find that using our approach the top outliers can often be isolated very quickly, typically by scanning less than 3% of the data set.