Locality Sensitive Outlier Detection: A ranking driven approach

Authors:
Ye Wang;Srinivasan Parthasarathy;Shirish Tatikonda
Affiliations:
Computer Science and Engineering Department, The Ohio State University, USA;Computer Science and Engineering Department, The Ohio State University, USA;Computer Science and Engineering Department, The Ohio State University, USA
Venue:
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Year:
2011

Citing 0
Cited 4

LSH based outlier detection and its application in distributed setting

Proceedings of the 20th ACM international conference on Information and knowledge management
A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Flexible and adaptive subspace search for outlier analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection is fundamental to a variety of database and analytic tasks. Recently, distance-based outlier detection has emerged as a viable and scalable alternative to traditional statistical and geometric approaches. In this article we explore the role of ranking for the efficient discovery of distance-based outliers from large high dimensional data sets. Specifically, we develop a light-weight ranking scheme that is powered by locality sensitive hashing, which reorders the database points according to their likelihood of being an outlier. We provide theoretical arguments to justify the rationale for the approach and subsequently conduct an extensive empirical study highlighting the effectiveness of our approach over extant solutions. We show that our ranking scheme improves the efficiency of the distance-based outlier discovery process by up to 5-fold. Furthermore, we find that using our approach the top outliers can often be isolated very quickly, typically by scanning less than 3% of the data set.