Empirical risk minimization versus maximum-likelihood estimation: a case study

  • Authors:
  • Ronny Meir

  • Affiliations:
  • -

  • Venue:
  • Neural Computation
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the interaction between input distributions, learningalgorithms, and finite sample sizes in the case of learningclassification tasks. Focusing on the case of normal inputdistributions, we use statistical mechanics techniques to calculatethe empirical and expected (or generalization) errors for severalwell-known algorithms learning the weights of a single-layerperceptron. In the case of spherically symmetric distributionswithin each class we find that the simple Hebb rule, correspondingto maximum-likelihood parameter estimation, outperforms the othermore complex algorithms, based on error minimization. Moreover, weshow that in the regime where the overlap between the classes islarge, algorithms with low empirical error do worse in terms ofgeneralization, a phenomenon known as overtraining.