Condensed Nearest Neighbor Data Domain Description

Authors:
Fabrizio Angiulli
Affiliations:
-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2007

Citing 9
Cited 2

Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Machine Learning
Outliers in statistical pattern recognition and an application to automatic chromosome classification

Pattern Recognition Letters
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast condensed nearest neighbor rule

ICML '05 Proceedings of the 22nd international conference on Machine learning
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Another move toward the minimum consistent subset: a tabu searchapproach to the condensed nearest neighbor rule

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Prototype-based Domain Description

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
On the representation of a digital contour with an unordered point set for visual perception

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.14

Visualization

Abstract

A simple yet effective unsupervised classification rule to discriminate between normal and abnormal data is based on accepting test objects whose nearest neighbors distances in a reference data set, assumed to model normal behavior, lie within a certain threshold. This work investigates the effect of using a subset of the original data set as the reference set of the classifier. With this aim, the concept of a reference consistent subset is introduced and it is shown that finding the minimum cardinality reference consistent subset is intractable. Then, the CNNDD algorithm is described, which computes a reference consistent subset with only two reference set passes. Experimental results revealed the advantages of condensing the data set and confirmed the effectiveness of the proposed approach. A thorough comparison with related methods was accomplished, pointing out the strengths and weaknesses of one-class nearest-neighbor-based training set consistent condensation.