On k-NN Method with Preprocessing

Authors:
Zbigniew Suraj;Pawel Delimata
Affiliations:
University of Information Technology and Management, H. Sucharskiego 2, 35-225 Rzeszow, Poland. E-mail: zsuraj@wsiz.rzeszow.pl;University of Rzeszow, Rejtana 16A, 35-310 Rzeszow, Poland. E-mail: pdelimata@wp.pl
Venue:
Fundamenta Informaticae
Year:
2005

Citing 6
Cited 1

Unifying instance-based and rule-based induction

Machine Learning
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
RIONA: A New Classification System Combining Rule Induction and Instance-Based Learning

Fundamenta Informaticae
A view on rough set concept approximations

RSFDGrC'03 Proceedings of the 9th international conference on Rough sets, fuzzy sets, data mining, and granular computing

Feature Selection Algorithm for Multiple Classifier Systems: A Hybrid Approach

Fundamenta Informaticae - Concurrency Specification and Programming (CS&P)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The objective of this study is to introduce a new model of data classification based on preliminary reduction of the training set of examples (preprocessing) in order to facilitate the use of nearest neighbours (NN) techniques in near real-time applications. This study accordingly addresses the issue of minimising the computational resource requirements of NN techniques, memory as well as time. The approach proposed in the paper is a modification of the classical k-Nearest Neighbours (k-NN) method and the k-NN method with local metric induction. Generally, the k-NN method with local metric induction in comparison with the classical k-NN method gives better results in the classification of new examples. Nevertheless, for the large data sets the k-NN method with local metric induction is less time effective than the classical one. The time/space efficiency of classifying algorithms based on these two methods depends not only on a given metric but also on the size of training data. In the paper, we present three methods of preliminary reduction of the training set of examples. All reduction methods decrease the size of a given experimental data preserving the relatively high classification accuracy. Results of experiments conducted on well known data sets, demonstrate the potential benefits of such reduction methods.