Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching
Communications of the ACM
Clustering Algorithms
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Fast Outlier Detection in High Dimensional Spaces
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
ADMIT: anomaly-based data mining for intrusions
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Outliers with Faster Cutoff Update and Space Utilization
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Mining outliers with faster cutoff update and space utilization
Pattern Recognition Letters
Editorial: New fuzzy c-means clustering model based on the data weighted approach
Data & Knowledge Engineering
Expert Systems with Applications: An International Journal
WSEAS Transactions on Information Science and Applications
Mining Outliers with Adaptive Cutoff Update and Space Utilization (RACAS)
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A distributed approach to detect outliers in very large data sets
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Distance-based outlier detection: consolidation and renewed bearing
Proceedings of the VLDB Endowment
Finding key attribute subset in dataset for outlier detection
Knowledge-Based Systems
Finding key knowledge attribute subspace of outliers in high-dimensional dataset
Expert Systems with Applications: An International Journal
Locality sensitive hashing for sampling-based algorithms in association rule mining
Expert Systems with Applications: An International Journal
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms for speeding up distance-based outlier detection
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Simple instance selection for bankruptcy prediction
Knowledge-Based Systems
OddBall: spotting anomalies in weighted graphs
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Fast and reliable anomaly detection in categorical data
Proceedings of the 21st ACM international conference on Information and knowledge management
Genetic algorithms in feature and instance selection
Knowledge-Based Systems
Flexible and adaptive subspace search for outlier analysis
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Fast global k-means clustering based on local geometrical information
Information Sciences: an International Journal
Review: A review of novelty detection
Signal Processing
Hi-index | 0.00 |
Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection. In recent years, many research efforts have looked at developing fast distance-based outlier detection algorithms. Several of the existing distance-based outlier detection algorithms report log-linear time performance as a function of the number of data points on many real low-dimensional datasets. However, these algorithms are unable to deliver the same level of performance on high-dimensional datasets, since their scaling behavior is exponential in the number of dimensions. In this paper, we present RBRP, a fast algorithm for mining distance-based outliers, particularly targeted at high-dimensional datasets. RBRP scales log-linearly as a function of the number of data points and linearly as a function of the number of dimensions. Our empirical evaluation demonstrates that we outperform the state-of-the-art algorithm, often by an order of magnitude.