An algorithm for finding nearest neighbours in (approximately) constant average time
Pattern Recognition Letters
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching
Communications of the ACM
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Fast Outlier Detection in High Dimensional Spaces
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
A unified approach for mining outliers
CASCON '97 Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier Mining in Large High-Dimensional Data Sets
IEEE Transactions on Knowledge and Data Engineering
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Mining distance-based outliers from large databases in any metric space
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Very efficient mining of distance-based outliers
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Distance-based outlier queries in data streams: the novel task and algorithms
Data Mining and Knowledge Discovery
Outlier detection for simple default theories
Artificial Intelligence
Inter-image outliers and their application to image classification
Pattern Recognition
Mining Outliers with Adaptive Cutoff Update and Space Utilization (RACAS)
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A distributed approach to detect outliers in very large data sets
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
On detecting clustered anomalies using SCiForest
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Algorithms for speeding up distance-based outlier detection
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
NDoT: nearest neighbor distance based outlier detection technique
PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
iBAT: detecting anomalous taxi trajectories from GPS traces
Proceedings of the 13th international conference on Ubiquitous computing
Isolation-Based Anomaly Detection
ACM Transactions on Knowledge Discovery from Data (TKDD)
SMART: Stream Monitoring enterprise Activities by RFID Tags
Information Sciences: an International Journal
Proceedings of the 16th International Database Engineering & Applications Sysmposium
AUDIO: an integrity auditing framework of outlier-mining-as-a-service systems
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Subsampling for efficient and effective unsupervised outlier detection ensembles
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining and Knowledge Discovery
Exploiting domain knowledge to detect outliers
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
In this work a novel distance-based outlier detection algorithm, named DOLPHIN, working on disk-resident datasets and whose I/O cost corresponds to the cost of sequentially reading the input dataset file twice, is presented. It is both theoretically and empirically shown that the main memory usage of DOLPHIN amounts to a small fraction of the dataset and that DOLPHIN has linear time performance with respect to the dataset size. DOLPHIN gains efficiency by naturally merging together in a unified schema three strategies, namely the selection policy of objects to be maintained in main memory, usage of pruning rules, and similarity search techniques. Importantly, similarity search is accomplished by the algorithm without the need of preliminarily indexing the whole dataset, as other methods do. The algorithm is simple to implement and it can be used with any type of data, belonging to either metric or nonmetric spaces. Moreover, a modification to the basic method allows DOLPHIN to deal with the scenario in which the available buffer of main memory is smaller than its standard requirements. DOLPHIN has been compared with state-of-the-art distance-based outlier detection algorithms, showing that it is much more efficient.