Condensed nearest neighbor data domain description

  • Authors:
  • Fabrizio Angiulli

  • Affiliations:
  • ICAR-CNR, Rende, (CS), Italy

  • Venue:
  • IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A popular method to discriminate between normal and abnormal data is based on accepting test objects whose nearest neighbors distances in a reference data set lie within a certain threshold. In this work we investigate the possibility of using as reference set a subset of the original data set. We discuss relationship between reference set size and generalization, and show that finding the minimum cardinality reference consistent subset is intractable. Then, we describe an algorithm that computes a reference consistent subset with only two reference set passes. Experimental results confirm the effectiveness of the approach.