Robust regression and outlier detection
Robust regression and outlier detection
Algorithms for clustering data
Algorithms for clustering data
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching
Communications of the ACM
Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Machine Learning
An Interval Classifier for Database Mining Applications
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Knowledge Discovery in Databases: An Attribute-Oriented Approach
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
On Digital Money and Card Technologies
On Digital Money and Card Technologies
Parallel Mining of Outliers in Large Database
Distributed and Parallel Databases
Outlier Detection Using Replicator Neural Networks
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Angle-based outlier detection in high-dimensional data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets
ACM Transactions on Knowledge Discovery from Data (TKDD)
ACM Computing Surveys (CSUR)
An efficient histogram method for outlier detection
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Hi-index | 0.00 |
This paper deals with finding outliers (exceptions) in large datasets. The identification of outliers can often lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the analysis of performance statistics of professional athletes. One contribution of this paper is to show how our proposed, intuitive notion of outliers can unify or generalize many of the existing notions of outliers provided by discordancy tests for standard statistical distributions. Thus, when mining large datasets containing many attributes, a unified approach can replace many statistical discordancy tests, regardless of any knowledge about the underlying distribution of the attributes. A second contribution of this paper is the development of an algorithm to find all outliers in a dataset. An important advantage of this algorithm is that its time complexity is linear with respect to the number of objects in the dataset. We include preliminary performance results.