Combining feature and example pruning by uncertainty minimization

Authors:
Marc Sebban;Richard Nock
Affiliations:
Dpt of SJE, French West Indies and Guiana University, Pointe-à-Pitre Cedex;Dept of Mathematics and CS, French West Indies and Guiana University, Pointe-à-Pitre Cedex
Venue:
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Year:
2000

Citing 8
Cited 0

Instance-Based Learning Algorithms

Machine Learning
On the boosting ability of top-down decision tree learning algorithms

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Instance Pruning Techniques

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Instance Pruning as an Information Preserving Problem

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On Feature Selection: A New Filter Model

Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
Identifying and eliminating mislabeled training instances

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We focus in this paper on dataset reduction techniques for use in k-nearest neighbor classification. In such a context, feature and prototype selections have always been independently treated by the standard storage reduction algorithms. While this certifying is theoretically justified by the fact that each subproblem is NP-hard, we assume in this paper that a joint storage reduction is in fact more intuitive and can in practice provide better results than two independent processes. Moreover, it avoids a lot of distance calculations by progressively removing useless instances during the feature pruning. While standard selection algorithms often optimize the accuracy to discriminate the set of solutions, we use in this paper a criterion based on an uncertainty measure within a nearestneighbor graph. This choice comes from recent results that have proven that accuracy is not always the suitable criterion to optimize. In our approach, a feature or an instance is removed if its deletion improves information of the graph. Numerous experiments are presented in this paper and a statistical analysis shows the relevance of our approach, and its tolerance in the presence of noise.