Distance-Based Detection and Prediction of Outliers

Authors:
Fabrizio Angiulli;Stefano Basta;Clara Pizzuti
Affiliations:
-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2006

Citing 15
Cited 16

Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Machine Learning
Multidimensional access methods

ACM Computing Surveys (CSUR)
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Very efficient mining of distance-based outliers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Outlier detection using default reasoning

Artificial Intelligence
A New Approach to Outlier Detection

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Online Outlier Detection Based on Relative Neighbourhood Dissimilarity

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Efficiently finding unusual shapes in large image databases

Data Mining and Knowledge Discovery
Detecting outlier samples in multivariate time series dataset

Knowledge-Based Systems
Distance-based outlier queries in data streams: the novel task and algorithms

Data Mining and Knowledge Discovery
ODDC: outlier detection using distance distribution clustering

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
A new algorithm for high-dimensional outlier detection based on constrained particle swarm intelligence

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
A distributed approach to detect outliers in very large data sets

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Mining outliers with ensemble of heterogeneous detectors on random subspaces

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Intelligent agent-based intrusion detection system using enhanced multiclass SVM

Computational Intelligence and Neuroscience
Exploiting domain knowledge to detect outliers

Data Mining and Knowledge Discovery
A multivariate fuzzy system applied for outliers detection

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A distance-based outlier detection method that finds the top outliers in an unlabeled data set and provides a subset of it, called outlier detection solving set, that can be used to predict the outlierness of new unseen objects, is proposed. The solving set includes a sufficient number of points that permits the detection of the top outliers by considering only a subset of all the pairwise distances from the data set. The properties of the solving set are investigated, and algorithms for computing it, with subquadratic time requirements, are proposed. Experiments on synthetic and real data sets to evaluate the effectiveness of the approach are presented. A scaling analysis of the solving set size is performed, and the false positive rate, that is, the fraction of new objects misclassified as outliers using the solving set instead of the overall data set, is shown to be negligible. Finally, to investigate the accuracy in separating outliers from inliers, ROC analysis of the method is accomplished. Results obtained show that using the solving set instead of the data set guarantees a comparable quality of the prediction, but at a lower computational cost.