Outlier detection by sampling with accuracy guarantees

  • Authors:
  • Mingxi Wu;Christopher Jermaine

  • Affiliations:
  • Department of Computer and Information Sciences and Engineering University of Florida Gainesville, FL, USA, 32611 mwu@cise.ufl.edu;Department of Computer and Information Sciences and Engineering University of Florida Gainesville, FL, USA, 32611 cjermain@cise.ufl.edu

  • Venue:
  • Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

An effective approach to detecting anomalous points in a data setis distance-based outlier detection. This paper describes a simplesampling algorithm to effciently detect distance-based outliers indomains where each and every distance computation is veryexpensive. Unlike any existing algorithms, the sampling algorithmrequires a xed number of distance computations and can return goodresults with accuracy guarantees. The most computationallyexpensive aspect of estimating the accuracy of the result issorting all of the distances computed by the sampling algorithm.The experimental study on two expensive domains as well as tenadditional real-life datasets demonstrates both the effciency andeffectiveness of the sampling algorithm in comparison with thestate-of-the-art algorithm and there liability of the accuracyguarantees.