Towards Heterogeneous Similarity Function Learning for the k-Nearest Neighbors Classification

Authors:
Karol Grudziński
Affiliations:
Department of Physics, Kazimierz Wielki University, Bydgoszcz, Poland 85-072 and Institute of Applied Informatics, University of Economy, Bydgoszcz, Poland 85-229
Venue:
ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
Year:
2006

Citing 9
Cited 0

Instance-Based Learning Algorithms

Machine Learning
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
Arbitrating among competing classifiers using learned referees

Knowledge and Information Systems
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Meta-learning via Search Combined with Parameter Optimization

Proceedings of the IIS'2002 Symposium on Intelligent Information Systems
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Decision-making processes in pattern recognition (ACM monograph series)

Decision-making processes in pattern recognition (ACM monograph series)
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Quantified Score

Hi-index	0.01

Visualization

Abstract

In order to classify an unseen (query) vector qwith the k-Nearest Neighbors method (k-NN) one computes a similarity function between qand training vectors in a database. In the basic variant of the k-NN algorithm the predicted class of qis estimated by taking the majority class of the q's k-nearest neighbors. Various similarity functions may be applied leading to different classification results. In this paper a heterogeneous similarity function is constructed out of different 1-component metrics by minimization of the number of classification errors the system makes on a training set. The HSFL-NN system, which has been introduced in this paper, on five tested datasets has given better results on unseen samples than the plain k-NN method with the optimally selected kparameter and the optimal homogeneous similarity function.