The general inefficiency of batch training for gradient descent learning

  • Authors:
  • D. Randall Wilson;Tony R. Martinez

  • Affiliations:
  • Fonix Corporation, 180 West Election Road Suite 200, Draper, UT;Computer Science Department, 3361 TMCB, Brigham Young University, Provo, UT

  • Venue:
  • Neural Networks
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gradient descent training of neural networks can be done in either a batch or on-line manner. A widely held myth in the neural network community is that batch training is as fast or faster and/or more 'correct' than on-line training because it supposedly uses a better approximation of the true gradient for its weight updates. This paper explains why batch training is almost always slower than on-line training--often orders of magnitude slower--especially on large training sets. The main reason is due to the ability of on-line training to follow curves in the error surface throughout each epoch, which allows it to safely use a larger learning rate and thus converge with less iterations through the training data. Empirical results on a large (20,000-instance) speech recognition task and on 26 other learning tasks demonstrate that convergence can be reached significantly faster using on-line training than batch training, with no apparent difference in accuracy.