Incremental training of first order recurrent neural networks to predict a context-sensitive language

Authors:
Stephan K. Chalup;Alan D. Blair
Affiliations:
School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, NSW 2308, Australia;School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW 2052, Australia
Venue:
Neural Networks
Year:
2003

Citing 40
Cited 1

Convergent activation dynamics in continuous time networks

Neural Networks
Introduction to the theory of neural computation

Introduction to the theory of neural computation
Subgrouping reduces complexity and speeds up learning in recurrent networks

Advances in neural information processing systems 2
Distributed Representations, Simple Recurrent Networks, And Grammatical Structure

Machine Learning - Connectionist approaches to language learning
The Induction of Dynamical Recognizers

Machine Learning - Connectionist approaches to language learning
Learning and extracting finite state automata with second-order recurrent neural networks

Neural Computation
Learning and extracting initial mealy automata with a modular neural network model

Neural Computation
Analysis of dynamical recognizers

Neural Computation
Tree-adjoining grammars

Handbook of formal languages, vol. 3
Dynamical recognizers: real time language recognition by analog computers

Theoretical Computer Science
On the effect of analog noise in discrete-time analog computations

Neural Computation
Co-Evolution in the Successful Learning of Backgammon Strategy

Machine Learning
Automata and neural networks

The handbook of brain theory and neural networks
Recurrent networks: supervised learning

The handbook of brain theory and neural networks
Analog neural nets with Gaussian or other common noise distributions cannot recognize arbitary regular languages

Neural Computation
A precise characterization of the class of languages recognized by neural nets under Gaussian and other common noise distributions

Proceedings of the 1998 conference on Advances in neural information processing systems II
Neural networks and analog computation: beyond the Turing limit

Neural networks and analog computation: beyond the Turing limit
The theory of evolution strategies

The theory of evolution strategies
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
The Handbook of Brain Theory and Neural Networks

The Handbook of Brain Theory and Neural Networks
Evolution and Optimum Seeking: The Sixth Generation

Evolution and Optimum Seeking: The Sixth Generation
Connectionist Approaches to Language Learning

Connectionist Approaches to Language Learning
Handbook of Evolutionary Computation

Handbook of Evolutionary Computation
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
Designing a Counter: Another Case Study of Dynamics and Activation Landscapes in Recurrent Networks

KI '97 Proceedings of the 21st Annual German Conference on Artificial Intelligence: Advances in Artificial Intelligence
Recurrent Neural Network Architectures: An Overview

Adaptive Processing of Sequences and Data Structures, International Summer School on Neural Networks, "E.R. Caianiello"-Tutorial Lectures
Using Prior Knowledge in a {NNPDA} to Learn Context-Free Languages

Advances in Neural Information Processing Systems 5, [NIPS Conference]
RAAM for Infinite Context-Free Languages

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 5 - Volume 5
Computation: finite and infinite machines

Computation: finite and infinite machines
Simple Recurrent Networks Learn Context-Free and Context-Sensitive Languages by Counting

Neural Computation
Attractive Periodic Sets in Discrete-Time Recurrent Networks (with Emphasis on Fixed-Point Stability and Bifurcations in Two-Neuron Networks)

Neural Computation
Stable Encoding of Finite-State Machines in Discrete-Time Recurrent Neural Nets with Sigmoid Units

Neural Computation
The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction

Neural Computation
A new evolutionary system for evolving artificial neural networks

IEEE Transactions on Neural Networks
Inductive inference from noisy examples using the hybrid finite state filter

IEEE Transactions on Neural Networks
Identification of a specific limitation on local-feedback recurrent networks acting as Mealy-Moore machines

IEEE Transactions on Neural Networks
LSTM recurrent networks learn simple context-free and context-sensitive languages

IEEE Transactions on Neural Networks
On learning context-free and context-sensitive languages

IEEE Transactions on Neural Networks
On the computational power of Elman-style recurrent networks

IEEE Transactions on Neural Networks
Gradient calculations for dynamic recurrent neural networks: a survey

IEEE Transactions on Neural Networks

A Characterization of Simple Recurrent Neural Networks with Two Hidden Units as a Language Recognizer

Neural Information Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years it has been shown that first order recurrent neural networks trained by gradient-descent can learn not only regular but also simple context-free and context-sensitive languages. However, the success rate was generally low and severe instability issues were encountered. The present study examines the hypothesis that a combination of evolutionary hill climbing with incremental learning and a well-balanced training set enables first order recurrent networks to reliably learn context-free and mildly context-sensitive languages. In particular, we trained the networks to predict symbols in string sequences of the context-sensitive language {anbncn; n ≥ 1}. Comparative experiments with and without incremental learning indicated that incremental learning can accelerate and facilitate training. Furthermore, incrementally trained networks generally resulted in monotonic trajectories in hidden unit activation space, while the trajectories of non- incrementally trained networks were oscillating. The non-incrementally trained networks were more likely to generalise.