Efficient Pruning Schemes for Distance-Based Outlier Detection

  • Authors:
  • Nguyen Hoang Vu;Vivekanand Gopalkrishnan

  • Affiliations:
  • Nanyang Technological University, Singapore;Nanyang Technological University, Singapore

  • Venue:
  • ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Outlier detection finds many applications, especially in domains that have scope for abnormal behavior. In this paper, we present a new technique for detecting distance-based outliers, aimed at reducing execution time associated with the detection process. Our approach operates in two phases and employs three pruning rules. In the first phase, we partition the data into clusters, and make an early estimate on the lower bound of outlier scores. Based on this lower bound, the second phase then processes relevant clusters using the traditional block nested-loop algorithm. Here two efficient pruning rules are utilized to quickly discard more non-outliers and reduce the search space. Detailed analysis of our approach shows that the additional overhead of the first phase is offset by the reduction in cost of the second phase. We also demonstrate the superiority of our approach over existing distance-based outlier detection methods by extensive empirical studies on real datasets.