COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Neural Computation
Natural gradient works efficiently in learning
Neural Computation
Parameter convergence and learning curves for neural networks
Neural Computation
Algebraic geometrical methods for hierarchical learning machines
Neural Networks
Estimates of average complexity of neurocontrol algorithms
Neural Networks
On Density Estimation under Relative Entropy Loss Criterion
Problems of Information Transmission
Entropic Measures with Radial Basis Units
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Generalization Error and Training Error at Singularities of Multilayer Perceptrons
IWANN '01 Proceedings of the 6th International Work-Conference on Artificial and Natural Neural Networks: Connectionist Models of Neurons, Learning Processes and Artificial Intelligence-Part I
Asymptotic properties of the Fisher kernel
Neural Computation
Results in statistical discriminant analysis: a review of the former Soviet union literature
Journal of Multivariate Analysis
An asymptotic statistical theory of polynomial kernel methods
Neural Computation
Algebraic Analysis for Nonidentifiable Learning Machines
Neural Computation
Singularities Affect Dynamics of Learning in Neuromanifolds
Neural Computation
Generalization error analysis for polynomial kernel methods: algebraic geometrical approach
ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
The Journal of Machine Learning Research
Hi-index | 0.00 |
The present paper elucidates a universal property of learningcurves, which shows how the generalization error, training error,and the complexity of the underlying stochastic machine are relatedand how the behavior of a stochastic machine is improved as thenumber of training examples increases. The error is measured by theentropic loss. It is proved that the generalization error convergesto H0, the entropy of the conditionaldistribution of the true machine, as H0 +m*/(2t), while the training error converges asH0-m*/(2t), where t is thenumber of examples and m* shows the complexity ofthe network. When the model is faithful, implying that the truemachine is in the model, m* is reduced tom, the number of modifiable parameters. This is a universallaw because it holds for any regular machine irrespective of itsstructure under the maximum likelihood estimator. Similar relationsare obtained for the Bayes and Gibbs learning algorithms. Theselearning curves show the relation among the accuracy of learning,the complexity of a model, and the number of training examples.