Class-based n-gram models of natural language
Computational Linguistics
Introducing statistical dependencies and structural constraints in variable-length sequence models
ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Learning local lexical structure in spontaneous speech language modeling
Learning local lexical structure in spontaneous speech language modeling
Variable-order N-gram generation by word-class splitting and consecutive word grouping
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Hi-index | 0.00 |
In this paper, we present a stochastic language modeling tool which aims at retrieving variable-length phrases (multigrams), assuming bigram dependencies between them. The phrase retrieval can be intermixed with a phrase clustering procedure, so that the language data are iteratively structured at both a paradigmatic and a syntagmatic level in a fully integrated way. Perplexity results on ATR travel arrangement data with a bi-multigram model (assuming bigram correlations between the phrases) come very close to the trigram scores with a reduced number of entries in the language model. Also the ability of the class version of the model to merge semantically related phrases into a common class is illustrated.