LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Simple Random Sampling from Relational Databases
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
An effective and efficient algorithm for high-dimensional outlier detection
The VLDB Journal — The International Journal on Very Large Data Bases
Feature bagging for outlier detection
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Very efficient mining of distance-based outliers
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
CURIO: a fast outlier and outlier cluster detection algorithm for large datasets
AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Efficiently finding unusual shapes in large image databases
Data Mining and Knowledge Discovery
Disk aware discord discovery: finding unusual time series in terabyte sized datasets
Knowledge and Information Systems
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets
ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining Outliers with Faster Cutoff Update and Space Utilization
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
ACM Computing Surveys (CSUR)
Efficient anomaly monitoring over moving object trajectory streams
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Pruning Schemes for Distance-Based Outlier Detection
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Distance-based outlier queries in data streams: the novel task and algorithms
Data Mining and Knowledge Discovery
Efficiently mining regional outliers in spatial data
SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Mining outliers with faster cutoff update and space utilization
Pattern Recognition Letters
Mining Outliers with Adaptive Cutoff Update and Space Utilization (RACAS)
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A distributed approach to detect outliers in very large data sets
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
An unbiased distance-based outlier detection approach for high-dimensional data
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Isolation-Based Anomaly Detection
ACM Transactions on Knowledge Discovery from Data (TKDD)
Continuous kernel-based outlier detection over distributed data streams
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Continuous adaptive outlier detection on distributed data streams
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Review: A review of novelty detection
Signal Processing
Hi-index | 0.00 |
Let R be a set of objects. An object o ∈ R is an outlier, if there exist less than k objects in R whose distances to o are at most r. The values of k, r, and the distance metric are provided by a user at the run time. The objective is to return all outliers with the smallest I/O cost.This paper considers a generic version of the problem, where no information is available for outlier computation, except for objects' mutual distances. We prove an upper bound for the memory consumption which permits the discovery of all outliers by scanning the dataset 3 times. The upper bound turns out to be extremely low in practice, e.g., less than 1% of R. Since the actual memory capacity of a realistic DBMS is typically larger, we develop a novel algorithm, which integrates our theoretical findings with carefully-designed heuristics that leverage the additional memory to improve I/O efficiency. Our technique reports all outliers by scanning the dataset at most twice (in some cases, even once), and significantly outperforms the existing solutions by a factor up to an order of magnitude.