Efficient dataset size reduction by finding homogeneous clusters

  • Authors:
  • Stefanos Ougiaroglou;Georgios Evangelidis

  • Affiliations:
  • University of Mecedonia, Thessaloniki, Greece;University of Mecedonia, Thessaloniki, Greece

  • Venue:
  • Proceedings of the Fifth Balkan Conference in Informatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although the k-Nearest Neighbor classifier is one of the most widely-used classification methods, it suffers from the high computational cost and storage requirements it involves. These major drawbacks have constituted an active research field over the last decades. This paper proposes an effective data reduction algorithm that has low preprocessing cost and reduces storage requirements while maintaining classification accuracy at an acceptable high level. The proposed algorithm is based on a fast pre-processing clustering procedure that creates homogeneous clusters. The centroids of these clusters constitute the reduced training-set. Experimental results, based on real-life datasets, illustrate that the proposed algorithm is faster and achieves higher reduction rates than three known existing methods, while it does not significantly reduce the classification accuracy.