A hybrid approach to statistical language modeling with multilayer perceptrons and unigrams

  • Authors:
  • Fernando Blat;María José Castro;Salvador Tortajada;Joan Andreu Sánchez

  • Affiliations:
  • Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, València, Spain;Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, València, Spain;Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, València, Spain;Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, València, Spain

  • Venue:
  • TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In language engineering, language models are employed in order to improve system performance. These language models are usually N-gram models which are estimated from large text databases using the occurrence frequencies of these N-grams. An alternative to conventional frequency-based estimation of N-gram probabilities consists on using neural networks to this end. In this paper, an approach to language modeling with a hybrid language model is presented as a linear combination of a connectionist N-gram model, which is used to represent the global relations between certain linguistic categories, and a stochastic model of word distribution into such categories. The hybrid language model is tested on the corpus of the Wall Street journal processed in the Penn Treebank project.