An effective and efficient algorithm for high-dimensional outlier detection

Authors:
C. Aggarwal;S. Yu
Affiliations:
IBM T.J. Watson Research Center, USA;IBM T.J. Watson Research Center, USA
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2005

Citing 16
Cited 22

Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Re-designing distance functions and distance-based applications for high dimensional data

ACM SIGMOD Record
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
On the Use of Conceptual Reconstruction for Mining Massively Incomplete Data Sets

IEEE Transactions on Knowledge and Data Engineering

Mining distance-based outliers from large databases in any metric space

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection in sensor networks

Proceedings of the 8th ACM international symposium on Mobile ad hoc networking and computing
An adaptive crossover-imaged clustering algorithm

SMO'07 Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization
An axis-shifted crossover-imaged clustering algorithm

WSEAS TRANSACTIONS on SYSTEMS
A deflected grid-based algorithm for clustering analysis

WSEAS Transactions on Computers
Mining influential attributes that capture class and group contrast behaviour

Proceedings of the 17th ACM conference on Information and knowledge management
Projected outlier detection in high-dimensional mixed-attributes data set

Expert Systems with Applications: An International Journal
Detecting Aggregate Incongruities in XML

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
A comprehensive survey of numeric and symbolic outlier mining techniques

Intelligent Data Analysis
Detecting Projected Outliers in High-Dimensional Data Streams

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Efficient Pruning Schemes for Distance-Based Outlier Detection

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
A concept lattice based outlier mining method in low-dimensional subspaces

Pattern Recognition Letters
A new algorithm for high-dimensional outlier detection based on constrained particle swarm intelligence

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Efficient outlier detection algorithm for heterogeneous data streams

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
Example-based robust DB-outlier detection for high dimensional data

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Locality sensitive hashing for sampling-based algorithms in association rule mining

Expert Systems with Applications: An International Journal
An unbiased distance-based outlier detection approach for high-dimensional data

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Jaywalking your dog: computing the Fréchet distance with shortcuts

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Mining outliers with ensemble of heterogeneous detectors on random subspaces

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Application of an improved adaptive chaos prediction model in aero-engine performance parameters

WSEAS Transactions on Mathematics
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are most important for high-dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms have been proposed for outlier detection that use several concepts of proximity in order to find the outliers based on their relationship to the other points in the data. However, in high-dimensional space, the data are sparse and concepts using the notion of proximity fail to retain their effectiveness. In fact, the sparsity of high-dimensional data can be understood in a different way so as to imply that every point is an equally good outlier from the perspective of distance-based definitions. Consequently, for high-dimensional data, the notion of finding meaningful outliers becomes substantially more complex and nonobvious. In this paper, we discuss new techniques for outlier detection that find the outliers by studying the behavior of projections from the data set.