The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Prediction with local patterns using cross-entropy
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Feature bagging for outlier detection
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Angle-based outlier detection in high-dimensional data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A Fast Feature-Based Method to Detect Unusual Patterns in Multidimensional Datasets
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Hi-index | 0.00 |
Local density-based outlier (LOF) is a useful method to detect outliers because of its model free and locally based property. However, the method is very slow for high dimensional datasets. In this paper, we introduce a randomization method that can computer LOF very efficiently for high dimensional datasets. Based on a consistency property of outliers, random points are selected to partition a data set to compute outlier candidates locally. Since the probability of a point to be isolated from its neighbors is small, we apply multiple iterations with random partitions to prune false outliers. The experiments on a variety of real and synthetic datasets show that the randomization is effective in computing LOF. The experiments also show that our method can compute LOF very efficiently with high dimensional data.