Detection and prediction of distance-based outliers

Authors:
Fabrizio Angiulli;Stefano Basta;Clara Pizzuti
Affiliations:
ICAR-CNR, Via Pietro Bucci, Rende (CS), Italy;ICAR-CNR, Via Pietro Bucci, Rende (CS), Italy;ICAR-CNR, Via Pietro Bucci, Rende (CS), Italy
Venue:
Proceedings of the 2005 ACM symposium on Applied computing
Year:
2005

Citing 11
Cited 1

Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Machine Learning
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Mining top-n local outliers in large databases

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Measuring the interestingness of articles in a limited user environment

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present an unsupervised distance-based outlier detection method designed to learn a model over the objects contained in a data set. The learned model, called solving set, is a small subset of the data set that is used to classify new unseen objects as outliers or not. We provide an algorithm that computes a solving set with sub-quadratic time requirements, and we give experimental evidence that the computed solving set is small and that the false positive rate, i.e. the fraction of new objects misclassified as outliers using the solving set instead of the overall data set, is negligible.