A unified approach for mining outliers

Authors:
Edwin M. Knorr;Raymond T. Ng
Affiliations:
Department of Computer Science, University of British Columbia, Vancouver, B.C. V6T 1Z4 Canada;Department of Computer Science, University of British Columbia, Vancouver, B.C. V6T 1Z4 Canada
Venue:
CASCON '97 Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research
Year:
1997

Citing 14
Cited 7

Robust regression and outlier detection

Robust regression and outlier detection
Algorithms for clustering data

Algorithms for clustering data
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching

Communications of the ACM
Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Learning From Noisy Examples

Machine Learning
An Interval Classifier for Database Mining Applications

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Knowledge Discovery in Databases: An Attribute-Oriented Approach

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
On Digital Money and Card Technologies

On Digital Money and Card Technologies

Parallel Mining of Outliers in Large Database

Distributed and Parallel Databases
Outlier Detection Using Replicator Neural Networks

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Angle-based outlier detection in high-dimensional data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
An efficient histogram method for outlier detection

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper deals with finding outliers (exceptions) in large datasets. The identification of outliers can often lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the analysis of performance statistics of professional athletes. One contribution of this paper is to show how our proposed, intuitive notion of outliers can unify or generalize many of the existing notions of outliers provided by discordancy tests for standard statistical distributions. Thus, when mining large datasets containing many attributes, a unified approach can replace many statistical discordancy tests, regardless of any knowledge about the underlying distribution of the attributes. A second contribution of this paper is the development of an algorithm to find all outliers in a dataset. An important advantage of this algorithm is that its time complexity is linear with respect to the number of objects in the dataset. We include preliminary performance results.