Representative prototype sets for data characterization and classification

Authors:
Ludwig Lausser;Christoph Müssel;Hans A. Kestler
Affiliations:
Research Group Bioinformatics and Systems Biology, Institute of Neural Information Processing, Ulm University, Ulm, Germany;Research Group Bioinformatics and Systems Biology, Institute of Neural Information Processing, Ulm University, Ulm, Germany;Research Group Bioinformatics and Systems Biology, Institute of Neural Information Processing, Ulm University, Ulm, Germany
Venue:
ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
Year:
2012

Citing 7
Cited 0

Learning vector quantization

The handbook of brain theory and neural networks
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
Tutorial on Practical Prediction Theory for Classification

The Journal of Machine Learning Research
The lack of a priori distinctions between learning algorithms

Neural Computation
Using a genetic algorithm for editing k-nearest neighbor classifiers

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Nearest prototype classification: clustering, genetic algorithms, or random search?

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
The condensed nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Common classifier models are designed to achieve high accuracies, while often neglecting the question of interpretability. In particular, most classifiers do not allow for drawing conclusions on the structure and quality of the underlying training data. By keeping the classifier model simple, an intuitive interpretation of the model and the corresponding training data is possible. A lack of accuracy of such simple models can be compensated by accumulating the decisions of several classifiers. We propose an approach that is particularly suitable for high-dimensional data sets of low cardinality, such as data gained from high-throughput biomolecular experiments. Here, simple base classifiers are obtained by choosing one data point of each class as a prototype for nearest neighbour classification. By enumerating all such classifiers for a specific data set, one can obtain a systematic description of the data structure in terms of class coherence. We also investigate the performance of the classifiers in cross-validation experiments by applying stand-alone prototype classifiers as well as ensembles of selected prototype classifiers.