Comparing Simple Recurrent Networks and n-Grams in a Large Corpus

Authors:
Paul Rodriguez
Affiliations:
Department of Psychology, University of California, Los Angeles, Los Angeles, CA 90095-1563
Venue:
Applied Intelligence
Year:
2003

Citing 0
Cited 2

Rules and generalization capacity extraction from ANN with GP

IWANN'03 Proceedings of the Artificial and natural neural networks 7th international conference on Computational methods in neural modeling - Volume 1
A hybrid approach to statistical language modeling with multilayer perceptrons and unigrams

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increased availability of text corpora and the growth of connectionism has stimulated a renewed interest in probabilistic models of language processing in computational linguistics and psycholinguistics. The Simple Recurrent Network (SRN) is an important connectionist model because it has the potential to learn temporal dependencies of unspecified length. In addition, many computational questions about the SRN's ability to learn dependencies between individual items extend to other models. This paper will report on experiments with an SRN trained on a large corpus and examine the ability of the network to learn bigrams, trigrams, etc., as a function of the size of the corpus. The performance is evaluated by an information theoretic measure of prediction (or guess) ranking and output vector entropy. With enough training and hidden units the SRN shows the ability to learn 5 and 6-gram dependencies, although learning an n-gram is contingent on its frequency and the relative frequency of other n-grams. In some cases, the network will learn relatively low frequency deep dependencies before relatively high frequency short ones if the deep dependencies do not require representational shifts in hidden unit space.