A unified approach for mining outliers

  • Authors:
  • Edwin M. Knorr;Raymond T. Ng

  • Affiliations:
  • Department of Computer Science, University of British Columbia, Vancouver, B.C. V6T 1Z4 Canada;Department of Computer Science, University of British Columbia, Vancouver, B.C. V6T 1Z4 Canada

  • Venue:
  • CASCON '97 Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper deals with finding outliers (exceptions) in large datasets. The identification of outliers can often lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the analysis of performance statistics of professional athletes. One contribution of this paper is to show how our proposed, intuitive notion of outliers can unify or generalize many of the existing notions of outliers provided by discordancy tests for standard statistical distributions. Thus, when mining large datasets containing many attributes, a unified approach can replace many statistical discordancy tests, regardless of any knowledge about the underlying distribution of the attributes. A second contribution of this paper is the development of an algorithm to find all outliers in a dataset. An important advantage of this algorithm is that its time complexity is linear with respect to the number of objects in the dataset. We include preliminary performance results.