What size net gives valid generalization?
Neural Computation
Skeletonization: a technique for trimming the fat from a network via relevance assessment
Advances in neural information processing systems 1
The cascade-correlation learning architecture
Advances in neural information processing systems 2
Advances in neural information processing systems 2
Dynamic behavior of constrained back propagation networks
Advances in neural information processing systems 2
Generalization by weight-elimination with application to forecasting
NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Simplifying neural networks by soft weight-sharing
Neural Computation
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Second Order Derivatives for Network Pruning: Optimal Brain Surgeon
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Hi-index | 0.00 |
We investigate the effects of including selected lateral interconnections in a feedforward neural network. In a network with one hidden layer consisting of m hidden neurons labeled 1,2... m, hidden neuron j is connected fully to the inputs, the outputs, and hidden neuron j + 1. As a consequence of the lateral connections, each hidden neuron receives two error signals: one from the output layer and one through the lateral interconnection. We show that the use of these lateral interconnections among the hidden-layer neurons facilitates controlled assignment of role and specialization of the hidden-layer neurons. In particular, we show that as training progresses, hidden neurons become progressively specialized---starting from the fringes (i.e., lower and higher numbered hidden neurons, e.g., 1, 2, m --- 1 m) and leaving the neurons in the center of the hidden layer (i.e., hidden-layer neurons numbered close to m/2) unspecialized or functionally identical. Consequently, the network behaves like network growing algorithms without the explicit need to add hidden units, and like soft weight sharing due to functionally identical neurons in the center of the hidden layer. Experimental results from one classification and one function approximation problems are presented to illustrate selective specialization of the hidden-layer neurons. In addition, the improved generalization that results from a decrease in the effective number of free parameters is illustrated through a simple function approximation example and with a real-world data set. Besides the reduction in the number of free parameters, the localization of weight sharing may also allow for a method that allows procedural determination for the number of hidden-layer neurons required for a given learning task.