The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Fast parallel similarity search in multimedia databases
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the geometry of similarity search: dimensionality curse and concentration of measure
Information Processing Letters
Database-friendly random projections
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
Clustering Algorithms
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve
IEEE Transactions on Knowledge and Data Engineering
Redefining Clustering for High-Dimensional Applications
IEEE Transactions on Knowledge and Data Engineering
High Dimensional Similarity Search With Space Filling Curves
Proceedings of the 17th International Conference on Data Engineering
Proceedings of the 17th International Conference on Data Engineering
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Fast Outlier Detection in High Dimensional Spaces
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Hilbert R-tree: An Improved R-tree using Fractals
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Enhancing Effectiveness of Outlier Detections for Low Density Patterns
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
A Unified Approach to Detecting Spatial Outliers
Geoinformatica
A unified approach for mining outliers
CASCON '97 Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research
Database-friendly random projections: Johnson-Lindenstrauss with binary coins
Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets
IEEE Transactions on Knowledge and Data Engineering
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Novelty detection: a review—part 1: statistical approaches
Signal Processing
Novelty detection: a review—part 2: neural network based approaches
Signal Processing
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A Survey of Outlier Detection Methodologies
Artificial Intelligence Review
Iterative Projected Clustering by Subspace Mining
IEEE Transactions on Knowledge and Data Engineering
Outlier Mining in Large High-Dimensional Data Sets
IEEE Transactions on Knowledge and Data Engineering
An effective and efficient algorithm for high-dimensional outlier detection
The VLDB Journal — The International Journal on Very Large Data Bases
Feature bagging for outlier detection
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Example-Based Robust Outlier Detection in High Dimensional Datasets
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Deriving quantitative models for correlation clusters
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Query Processing in Arbitrary Subspaces Using Vector Approximations
SSDBM '06 Proceedings of the 18th International Conference on Scientific and Statistical Database Management
SLOM: a new measure for local spatial outliers
Knowledge and Information Systems
Theory of nearest neighbors indexability
ACM Transactions on Database Systems (TODS)
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Converting Output Scores from Outlier Detection Algorithms into Probability Estimates
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
An Efficient Reference-Based Approach to Outlier Detection in Large Datasets
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
An overview of anomaly detection techniques: Existing solutions and latest technological trends
Computer Networks: The International Journal of Computer and Telecommunications Networking
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
Outlier identification in high dimensions
Computational Statistics & Data Analysis
Hos-Miner: a system for detecting outlyting subspaces of high-dimensional data
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Fast mining of distance-based outliers in high-dimensional datasets
Data Mining and Knowledge Discovery
On variants of the Johnson–Lindenstrauss lemma
Random Structures & Algorithms
Angle-based outlier detection in high-dimensional data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
DUSC: Dimensionality Unbiased Subspace Clustering
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
EDSC: efficient density-based subspace clustering
Proceedings of the 17th ACM conference on Information and knowledge management
Global Correlation Clustering Based on the Hough Transform
Statistical Analysis and Data Mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
ACM Computing Surveys (CSUR)
On High Dimensional Indexing of Uncertain Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Similarity Search in Arbitrary Subspaces Under Lp-Norm
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Nearest neighbors in high-dimensional data: the emergence and influence of hubs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
When is 'nearest neighbour' meaningful: A converse theorem and implications
Journal of Complexity
Is the Distance Compression Effect Overstated? Some Theory and Experimentation
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
A comprehensive survey of numeric and symbolic outlier mining techniques
Intelligent Data Analysis
Efficient Pruning Schemes for Distance-Based Outlier Detection
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
LoOP: local outlier probabilities
Proceedings of the 18th ACM conference on Information and knowledge management
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter
Subspace and projected clustering: experimental evaluation and analysis
Knowledge and Information Systems
The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering
Journal of Classification
Mining outliers with faster cutoff update and space utilization
Pattern Recognition Letters
On the existence of obstinate results in vector space models
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
GLS-SOD: a generalized local statistical approach for spatial outlier detection
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive outlierness for subspace outlier ranking
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
On the distance concentration awareness of certain data reduction techniques
Pattern Recognition
Can shared-neighbor distances defeat the curse of dimensionality?
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Subspace similarity search: efficient k-NN queries in arbitrary subspaces
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
SOREX: subspace outlier ranking exploration toolkit
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Distance-based outlier detection: consolidation and renewed bearing
Proceedings of the VLDB Endowment
Finding Local Anomalies in Very High Dimensional Space
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
The Journal of Machine Learning Research
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
An unbiased distance-based outlier detection approach for high-dimensional data
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Locality Sensitive Outlier Detection: A ranking driven approach
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
The role of hubness in clustering high-dimensional data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Quality of similarity rankings in time series
SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Spatial outlier detection: data, algorithms, visualizations
SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Distance metrics for high dimensional nearest neighborhood recovery: Compression and normalization
Information Sciences: an International Journal
Ranking outliers using symmetric neighborhood relationship
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Mining outliers with ensemble of heterogeneous detectors on random subspaces
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Visual evaluation of outlier detection models
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
The curse of dimensionality in data mining and time series prediction
IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
Anomaly Detection for Discrete Sequences: A Survey
IEEE Transactions on Knowledge and Data Engineering
HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Evaluation of Clusterings -- Metrics and Visual Support
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Clustering high dimensional data
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A survey on enhanced subspace clustering
Data Mining and Knowledge Discovery
Interactive data mining with 3D-parallel-coordinate-trees
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Subsampling for efficient and effective unsupervised outlier detection ensembles
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Systematic construction of anomaly detection benchmarks from real data
Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description
Data Mining and Knowledge Discovery
Ensembles for unsupervised outlier detection: challenges and research questions a position paper
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
High-dimensional data in Euclidean space pose special challenges to data mining algorithms. These challenges are often indiscriminately subsumed under the term ‘curse of dimensionality’, more concrete aspects being the so-called ‘distance concentration effect’, the presence of irrelevant attributes concealing relevant information, or simply efficiency issues. In about just the last few years, the task of unsupervised outlier detection has found new specialized solutions for tackling high-dimensional data in Euclidean space. These approaches fall under mainly two categories, namely considering or not considering subspaces (subsets of attributes) for the definition of outliers. The former are specifically addressing the presence of irrelevant attributes, the latter do consider the presence of irrelevant attributes implicitly at best but are more concerned with general issues of efficiency and effectiveness. Nevertheless, both types of specialized outlier detection algorithms tackle challenges specific to high-dimensional data. In this survey article, we discuss some important aspects of the ‘curse of dimensionality’ in detail and survey specialized algorithms for outlier detection from both categories. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 © 2012 Wiley Periodicals, Inc.