A Neural Syntactic Language Model

Authors:
Ahmad Emami;Frederick Jelinek
Affiliations:
Center for Language and Speech Processing, The Johns Hopkins University, Baltimore;Center for Language and Speech Processing, The Johns Hopkins University, Baltimore
Venue:
Machine Learning
Year:
2005

Citing 16
Cited 4

Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Distributed Representations, Simple Recurrent Networks, And Grammatical Structure

Machine Learning - Connectionist approaches to language learning
A maximum entropy approach to natural language processing

Computational Linguistics
Statistical methods for speech recognition

Statistical methods for speech recognition
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
How to design a connectionist holistic parser

Neural Computation
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Robust probabilistic predictive syntactic processing: motivations, models, and applications

Robust probabilistic predictive syntactic processing: motivations, models, and applications
A neural probabilistic language model

The Journal of Machine Learning Research
A structured language model

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Immediate-head parsing for language models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A study on richer syntactic dependencies for structured language modeling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Inducing history representations for broad coverage statistical parsing

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
The design for the wall street journal-based CSR corpus

HLT '91 Proceedings of the workshop on Speech and Natural Language
Training connectionist models for the structured language model

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing

Improving a statistical language model through non-linear prediction

Neurocomputing
Industrially oriented voice control system

Robotics and Computer-Integrated Manufacturing
Revisiting the case for explicit syntactic information in language models

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Architectures of neural networks applied for LVCSR language modeling

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a study of using neural probabilistic models in a syntactic based language model. The neural probabilistic model makes use of a distributed representation of the items in the conditioning history, and is powerful in capturing long dependencies. Employing neural network based models in the syntactic based language model enables it to use efficiently the large amount of information available in a syntactic parse in estimating the next word in a string. Several scenarios of integrating neural networks in the syntactic based language model are presented, accompanied by the derivation of the training procedures involved. Experiments on the UPenn Treebank and the Wall Street Journal corpus show significant improvements in perplexity and word error rate over the baseline SLM. Furthermore, comparisons with the standard and neural net based N-gram models with arbitrarily long contexts show that the syntactic information is in fact very helpful in estimating the word string probability. Overall, our neural syntactic based model achieves the best published results in perplexity and WER for the given data sets.