Efficient Pruning Schemes for Distance-Based Outlier Detection

Authors:
Nguyen Hoang Vu;Vivekanand Gopalkrishnan
Affiliations:
Nanyang Technological University, Singapore;Nanyang Technological University, Singapore
Venue:
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Year:
2009

Citing 11
Cited 4

LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining needle in a haystack: classifying rare classes via two-phase rule induction

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Unified Algorithm for Undirected Discovery of Execption Rules

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
An effective and efficient algorithm for high-dimensional outlier detection

The VLDB Journal — The International Journal on Very Large Data Bases
Mining distance-based outliers from large databases in any metric space

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online Outlier Detection Based on Relative Neighbourhood Dissimilarity

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering

Distance-based outlier detection: consolidation and renewed bearing

Proceedings of the VLDB Endowment
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Subsampling for efficient and effective unsupervised outlier detection ensembles

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection finds many applications, especially in domains that have scope for abnormal behavior. In this paper, we present a new technique for detecting distance-based outliers, aimed at reducing execution time associated with the detection process. Our approach operates in two phases and employs three pruning rules. In the first phase, we partition the data into clusters, and make an early estimate on the lower bound of outlier scores. Based on this lower bound, the second phase then processes relevant clusters using the traditional block nested-loop algorithm. Here two efficient pruning rules are utilized to quickly discard more non-outliers and reduce the search space. Detailed analysis of our approach shows that the additional overhead of the first phase is offset by the reduction in cost of the second phase. We also demonstrate the superiority of our approach over existing distance-based outlier detection methods by extensive empirical studies on real datasets.