Mining top-n local outliers in large databases

  • Authors:
  • Wen Jin;Anthony K. H. Tung;Jiawei Han

  • Affiliations:
  • Simon Fraser University, Burnaby, B.C., Canada;Simon Fraser University, Burnaby, B.C., Canada;Simon Fraser University, Burnaby, B.C., Canada

  • Venue:
  • Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. A recent work on outlier detection has introduced a novel notion of local outlier in which the degree to which an object is outlying is dependent on the density of its local neighborhood, and each object can be assigned a Local Outlier Factor (LOF) which represents the likelihood of that object being an outlier. Although the concept of local outliers is a useful one, the computation of LOF values for every data objects requires a large number of &kgr;-nearest neighbors searches and can be computationally expensive. Since most objects are usually not outliers, it is useful to provide users with the option of finding only n most outstanding local outliers, i.e., the top-n data objects which are most likely to be local outliers according to their LOFs. However, if the pruning is not done carefully, finding top-n outliers could result in the same amount of computation as finding LOF for all objects. In this paper, we propose a novel method to efficiently find the top-n local outliers in large databases. The concept of "micro-cluster" is introduced to compress the data. An efficient micro-cluster-based local outlier mining algorithm is designed based on this concept. As our algorithm can be adversely affected by the overlapping in the micro-clusters, we proposed a meaningful cut-plane solution for overlapping data. The formal analysis and experiments show that this method can achieve good performance in finding the most outstanding local outliers.