From Theoretical Learnability to Statistical Measures of the Learnable

Authors:
Marc Sebban;Gilles Richard
Affiliations:
-;-
Venue:
IDA '99 Proceedings of the Third International Symposium on Advances in Intelligent Data Analysis
Year:
1999

Citing 13
Cited 0

A theory of the learnable

Communications of the ACM
Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
Structural complexity 1

Structural complexity 1
C4.5: programs for machine learning

C4.5: programs for machine learning
An Experimental Comparison of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms

Machine Learning
Teaching a smarter learner

Journal of Computer and System Sciences
Learning from a consistently ignorant teacher

Journal of Computer and System Sciences
On the boosting ability of top-down decision tree learning algorithms

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Improved boosting algorithms using confidence-rated predictions

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Induction of Decision Trees

Machine Learning
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main focus of theoretical models for machine learning is to formally describe what is the meaning of learnable, what is a learning process, or what is the relationship between a learning agent and a teaching one. However, when we prove from a theoretical point of view that a concept is learnable, we have no a priori idea concerning the difficulty to learn the target concept. In this paper, after reminding some theoretical concepts and the main estimation methods, we provide a learning-system independent measure of the difficulty to learn a concept. It is based on geometrical and statistical concepts, and the implicit assumption that distinct classes occupy distinct regions in the feature space. In such a context, we assume the learnability to be identify by the separability level in the feature space. Our definition is constructive, based on a statistical test and has been implemented on problems of the UCI repository. The results are really convincing and fit well with theoretical results and intuition. Finally, in order to reduce the computational costs of our approach, we propose a new way to characterize the geometrical regions using a k-Nearest-Neighbors graph. We experimentally show that it allows to compute accuracy estimates near from those obtained by a leave-one-out-cross-validation and with smaller standard deviation.