Feature Extraction for the k-Nearest Neighbour Classifier with Genetic Programming

Authors:
Martijn C. J. Bot
Affiliations:
-
Venue:
EuroGP '01 Proceedings of the 4th European Conference on Genetic Programming
Year:
2001

Citing 7
Cited 4

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Using genetic algorithms to improve pattern classification performance

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Discovering time oriented abstractions in historical data to optimize decision tree classification

Advances in genetic programming
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Machine Learning

Machine Learning
Feature Transformation and Multivariate Decision Tree Induction

DS '98 Proceedings of the First International Conference on Discovery Science
Genetic programming for improved data mining: application to the biochemistry of protein interactions

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation

A generic multi-dimensional feature extraction method using multiobjective genetic programming

Evolutionary Computation
A generic optimising feature extraction method using multiobjective genetic programming

Applied Soft Computing
Automatic feature extraction using genetic programming: An application to epileptic EEG classification

Expert Systems with Applications: An International Journal
Evolutionary computation for supervised learning

Proceedings of the 15th annual conference companion on Genetic and evolutionary computation

Quantified Score

Hi-index	0.01

Visualization

Abstract

In pattern recognition the curse of dimensionality can be handled either by reducing the number of features, e.g. with decision trees or by extraction of new features.We propose a genetic programming (GP) framework for automatic extraction of features with the express aim of dimension reduction and the additional aim of improving accuracy of the k-nearest neighbour (k-NN) classifier. We will show that our system is capable of reducing most datasets to one or two features while k-NN accuracy improves or stays the same. Such a small number of features has the great advantage of allowing visual inspection of the dataset in a two-dimensional plot.Since k-NN is a non-linear classification algorithm[2], we compare several linear fitness measures. We will show the a very simple one, the accuracy of the minimal distance to means (mdm) classifier outperforms all other fitness measures.We introduce a stopping criterion gleaned from numeric mathematics. New features are only added if the relative increase in training accuracy is more than a constant d, for the mdm classifier estimated to be 3.3%.