Empirical risk minimization versus maximum-likelihood estimation: a case study

Authors:
Ronny Meir
Affiliations:
-
Venue:
Neural Computation
Year:
1995

Citing 0
Cited 5

Results in statistical discriminant analysis: a review of the former Soviet union literature

Journal of Multivariate Analysis
Dynamics and Generalization Ability of LVQ Algorithms

The Journal of Machine Learning Research
Statistical Mechanics of On-line Learning

Similarity-Based Clustering
Discriminative training of HMMs for automatic speech recognition: A survey

Computer Speech and Language
Window-based example selection in learning vector quantization

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the interaction between input distributions, learningalgorithms, and finite sample sizes in the case of learningclassification tasks. Focusing on the case of normal inputdistributions, we use statistical mechanics techniques to calculatethe empirical and expected (or generalization) errors for severalwell-known algorithms learning the weights of a single-layerperceptron. In the case of spherically symmetric distributionswithin each class we find that the simple Hebb rule, correspondingto maximum-likelihood parameter estimation, outperforms the othermore complex algorithms, based on error minimization. Moreover, weshow that in the regime where the overlap between the classes islarge, algorithms with low empirical error do worse in terms ofgeneralization, a phenomenon known as overtraining.