The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching
Communications of the ACM
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Fast Outlier Detection in High Dimensional Spaces
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier Mining in Large High-Dimensional Data Sets
IEEE Transactions on Knowledge and Data Engineering
Distance-Based Detection and Prediction of Outliers
IEEE Transactions on Knowledge and Data Engineering
Mining distance-based outliers from large databases in any metric space
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Disk aware discord discovery: finding unusual time series in terabyte sized datasets
Knowledge and Information Systems
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets
ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining Outliers with Faster Cutoff Update and Space Utilization
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Distance-based outlier queries in data streams: the novel task and algorithms
Data Mining and Knowledge Discovery
Distance-based outlier detection: consolidation and renewed bearing
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In this work a novel algorithm, named DOLPHIN, for detecting distance-based outliers is presented. The proposed algorithm performs only two sequential scans of the dataset. It needs to store into main memory a portion of the dataset, to efficiently search for neighbors and early prune inliers. The strategy pursued by the algorithm allows to keep this portion very small. Both theoretical justification and empirical evidence that the size of the stored data amounts only to a few percent of the dataset are provided. Another important feature of DOLPHIN is that the memory-resident data are indexed by using a suitable proximity search approach. This allows to search for nearest neighbors looking only at a small subset of the main memory stored data. Temporal and spatial cost analysis show that the novel algorithm achieves both near linear CPU and I/O cost. DOLPHIN has been compared with state of the art methods, showing that it outperforms existing ones.