The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method

  • Authors:
  • Chien-Hsing Chou;Bo-Han Kuo;Fu Chang

  • Affiliations:
  • Institute of Information Science, Academia Sinica, Taipei, Taiwan, R.O.C.;Institute of Information Science, Academia Sinica, Taipei, Taiwan, R.O.C.;Institute of Information Science, Academia Sinica, Taipei, Taiwan, R.O.C.

  • Venue:
  • ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a new data reduction algorithm that iteratively selects some samples and ignores others that can be absorbed, or represented, by those selected. This algorithm differs from the condensed nearest neighbor (CNN) rule in its employment of a strong absorption criterion, in contrast to the weak criterion employed by CNN; hence, it is called the generalized CNN (GCNN) algorithm. The new criterion allows GCNN to incorporate CNN as a special case, and can achieve consistency, or asymptotic Bayes-risk efficiency, under certain conditions. GCNN, moreover, can yield significantly better accuracy than other instance- based data reduction methods. We demonstrate the last claim through experiments on five datasets, some of which contain a very large number of samples.