Organization of the state space of a simple recurrent network before and after training on recursive linguistic structures

Authors:
Michal erňanský;Matej Makula;ubica Beňušková
Affiliations:
Faculty of Informatics and Information Technologies, Slovak Technical University, Ilkovičova 3, 842 16 Bratislava 4, Slovakia;Faculty of Informatics and Information Technologies, Slovak Technical University, Ilkovičova 3, 842 16 Bratislava 4, Slovakia;Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University, Mlynská dolina, 842 48 Bratislava 4, Slovakia
Venue:
Neural Networks
Year:
2007

Citing 12
Cited 2

Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks

Machine Learning - Connectionist approaches to language learning
Gradient-based learning algorithms for recurrent networks and their computational complexity

Backpropagation
The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Predicting the Future of Discrete Sequences from Fractal Representations of the Past

Machine Learning
Natural Language Grammatical Inference with Recurrent Neural Networks

IEEE Transactions on Knowledge and Data Engineering
On the emergence of rules in neural networks

Neural Computation
Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets

Neural Networks
Recurrent neural networks with small weights implement definite memory machines

Neural Computation
Architectural bias in recurrent neural networks: fractal analysis

Neural Computation
An Introduction to the Kalman Filter

An Introduction to the Kalman Filter
A learning algorithm for continually running fully recurrent neural networks

Neural Computation
Markovian architectural bias of recurrent neural networks

IEEE Transactions on Neural Networks

Analysis and Visualization of the Dynamics of Recurrent Neural Networks for Symbolic Sequences Processing

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
Improving the state space organization of untrained recurrent networks

ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recurrent neural networks are often employed in the cognitive science community to process symbol sequences that represent various natural language structures. The aim is to study possible neural mechanisms of language processing and aid in development of artificial language processing systems. We used data sets containing recursive linguistic structures and trained the Elman simple recurrent network (SRN) for the next-symbol prediction task. Concentrating on neuron activation clusters in the recurrent layer of SRN we investigate the network state space organization before and after training. Given a SRN and a training stream, we construct predictive models, called neural prediction machines, that directly employ the state space dynamics of the network. We demonstrate two important properties of representations of recursive symbol series in the SRN. First, the clusters of recurrent activations emerging before training are meaningful and correspond to Markov prediction contexts. We show that prediction states that naturally arise in the SRN initialized with small random weights approximately correspond to states of Variable Memory Length Markov Models (VLMM) based on individual symbols (i.e. words). Second, we demonstrate that during training, the SRN reorganizes its state space according to word categories and their grammatical subcategories, and the next-symbol prediction is again based on the VLMM strategy. However, after training, the prediction is based on word categories and their grammatical subcategories rather than individual words. Our conclusion holds for small depths of recursions that are comparable to human performances. The methods of SRN training and analysis of its state space introduced in this paper are of a general nature and can be used for investigation of processing of any other symbol time series by means of SRN.