Pruning recurrent neural networks for improved generalization performance

Authors:
C. L. Giles;C. W. Omlin
Affiliations:
NEC Res. Inst., Princeton, NJ;-
Venue:
IEEE Transactions on Neural Networks
Year:
1994

Citing 0
Cited 8

Neural input selection-A fast model-based approach

Neurocomputing
Soft-computing techniques and ARMA model for time series prediction

Neurocomputing
Third-order generalization: A new approach to categorizing higher-order generalization

Neurocomputing
Letters: A neural network to solve the hybrid N-parity: Learning with generalization issues

Neurocomputing
A multiobjective genetic algorithm for obtaining the optimal size of a recurrent neural network for grammatical inference

Pattern Recognition
Artificial neural networks capable of learning spatiotemporal chemical diffusion in the cortical brain

Pattern Recognition
Hierarchical multi-dimensional differential evolution for the design of beta basis function neural network

Neurocomputing
Ontology alignment using artificial neural network for large-scale ontologies

International Journal of Metadata, Semantics and Ontologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Determining the architecture of a neural network is an important issue for any learning task. For recurrent neural networks no general methods exist that permit the estimation of the number of layers of hidden neurons, the size of layers or the number of weights. We present a simple pruning heuristic that significantly improves the generalization performance of trained recurrent networks. We illustrate this heuristic by training a fully recurrent neural network on positive and negative strings of a regular grammar. We also show that rules extracted from networks trained with this pruning heuristic are more consistent with the rules to be learned. This performance improvement is obtained by pruning and retraining the networks. Simulations are shown for training and pruning a recurrent neural net on strings generated by two regular grammars, a randomly-generated 10-state grammar and an 8-state, triple-parity grammar. Further simulations indicate that this pruning method can have generalization performance superior to that obtained by training with weight decay