A neural syntactic language model

Authors:
Frederick Jelinek;Ahmad Emami
Affiliations:
The Johns Hopkins University;The Johns Hopkins University
Venue:
A neural syntactic language model
Year:
2006

Citing 0
Cited 2

Deep neural network language models

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical language models are widely used in fields dealing with speech or natural language. Examples are Automatic Speech Recognition (ASR) and Machine Translation (MT) that are assuming increasingly fundamental roles in information processing systems. The role of a statistical language model is to provide a prior knowledge of the likeliness of any sentence being spoken or generated. The focus of this thesis is to use neural probabilistic models for the purpose of statistical language models. The neural probabilistic model is simply a standard multi-layer neural network that given an input, produces probabilities at its out put. The use of sophisticated machine learning tools, such as neural networks, is in major contrast with the conventional simple and straightforward n-gram modeling techniques used in language modeling. The major problems of n-gram models is that they lack the capability to capture long-term dependencies and also to automatically detect and take advantage of semantic or syntactic similarities, whenever present, among words and phrases. Using simple n-gram models we were able to achieve considerable improvements in perplexity on the standard UPenn corpus and considerable reductions of the Word Error Rate (WER) on the HUB-1 Wall Street Journal (WSJ) test set as well as on a recent conversation telephony system (i.e. the Fisher corpus). The neural probabilistic model makes use of a distributed representation of the items in the conditioning history, and in contrast to n-gram models is powerful in capturing long dependencies and in generalizing from seen examples to unseen events. This thesis starts by first using neural network models in the conventional word n-gram modeling paradigm. Taking advantage of the neural probabilistic model's capability in using large number of inputs, we add extra information to be used by the neural network, examples of which are multi-word and class information. The next part of the thesis focuses on using neural probabilistic models on a syntactic based language model. The capability of neural probabilistic models in using large number of inputs from different sources makes them an ideal modeling choice to efficiently use the large amount of information available in a syntactic parse in estimating the next word in a string. (Abstract shortened by UMI.)