Communications of the ACM
Communications of the ACM - Special issue on parallelism
Structural complexity 1
C4.5: programs for machine learning
C4.5: programs for machine learning
Journal of Computer and System Sciences
Learning from a consistently ignorant teacher
Journal of Computer and System Sciences
On the boosting ability of top-down decision tree learning algorithms
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Improved boosting algorithms using confidence-rated predictions
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Machine Learning
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
The main focus of theoretical models for machine learning is to formally describe what is the meaning of learnable, what is a learning process, or what is the relationship between a learning agent and a teaching one. However, when we prove from a theoretical point of view that a concept is learnable, we have no a priori idea concerning the difficulty to learn the target concept. In this paper, after reminding some theoretical concepts and the main estimation methods, we provide a learning-system independent measure of the difficulty to learn a concept. It is based on geometrical and statistical concepts, and the implicit assumption that distinct classes occupy distinct regions in the feature space. In such a context, we assume the learnability to be identify by the separability level in the feature space. Our definition is constructive, based on a statistical test and has been implemented on problems of the UCI repository. The results are really convincing and fit well with theoretical results and intuition. Finally, in order to reduce the computational costs of our approach, we propose a new way to characterize the geometrical regions using a k-Nearest-Neighbors graph. We experimentally show that it allows to compute accuracy estimates near from those obtained by a leave-one-out-cross-validation and with smaller standard deviation.