Characterization of training errors in supervised learning using gradient-based rules

Authors:
Jun Wang;B. Malakooti
Affiliations:
University of North Dakota, USA;Case Western Reserve University, USA
Venue:
Neural Networks
Year:
1993

Citing 7
Cited 1

Neural networks and principal component analysis: learning from examples without local minima

Neural Networks
Multilayer feedforward networks are universal approximators

Neural Networks
On the approximate realization of continuous mappings by neural networks

Neural Networks
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
On the Problem of Local Minima in Backpropagation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximation of boolean functions by sigmoidal networks: Part i: xor and other two-variable functions

Neural Computation
A learning algorithm for continually running fully recurrent neural networks

Neural Computation

Multicriteria order acceptance decision support in over-demanded job shops: A neural network approach

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the majority of the existing supervised learning paradigms, a neural network is trained by minimizing an error function using a learning rule. The commonly used learning rules are gradient-based learning rules such as the popular backpropagation algorithm. This paper addresses an important issue on error minimization in supervised learning of neural networks using gradient-based learning rules. This paper characterizes asymptotic properties of training errors for various forms of neural networks in supervised learning and discusses their practical implications for designing neural networks via remarks and examples. The analytical results presented in this paper reveal the dependency of quality of supervised learning on the rank of training samples and associated steady activation stales. The analytical results also reveal the complexity of achieving a zero training error.