Learning curves for error minimum and maximum likelihood algorithms

Authors:
Y. Kabashima;S. Shinomoto
Affiliations:
Department of Physics, Kyoto University, Kyoto 606, Japan;Department of Physics, Kyoto University, Kyoto 606, Japan
Venue:
Neural Computation
Year:
1992

Citing 6
Cited 1

A theory of the learnable

Communications of the ACM
What size net gives valid generalization?

Neural Computation
A statistical approach to learning and generalization in layered neural networks

COLT '89 Proceedings of the second annual workshop on Computational learning theory
Four types of learning curves

Neural Computation
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
DECISION THEORETIC GENERALIZATIONS OF THE PAC MODEL FORNEURAL NET AND OTHER LEARNING APPLICATIONS

DECISION THEORETIC GENERALIZATIONS OF THE PAC MODEL FORNEURAL NET AND OTHER LEARNING APPLICATIONS

Estimation of network parameters in semiparametric stochastic perceptron

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

For the problem of dividing the space originally partitioned by a blurred boundary, every learning algorithm can make the probability of incorrect prediction of an individual example decrease with the number of training examples t. We address here the question of how the asymptotic form of (t) as well as its limit of convergence reflect the choice of learning algorithms. The error minimum algorithm is found to exhibit rather slow convergence of (t) to its lower bound 0, (t)-0O(t-2/3). Even for the purpose of minimizing prediction error, the maximum likelihood algorithm can be utilized as an alternative. If the true probability distribution happens to be contained in the family of hypothetical functions, then the boundary estimated from the hypothetical distribution function eventually converges to the best choice. Convergence of the prediction error is then (t)-0O(t-1). If the true distribution is not available from the algorithm, however, the boundary generally does not converge to the best choice, but instead (t)-1O(t-1/2), where 1 0 0.