Learning a syntagmatic and paradigmatic structure from language data with a bi-multigram model

Authors:
Sabine Deligne;Yoshinori Sagisaka
Affiliations:
ATR-ITL, Kyoto fu, Japan;ATR-ITL, Kyoto fu, Japan
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Year:
1998

Citing 4
Cited 0

Class-based n-gram models of natural language

Computational Linguistics
Introducing statistical dependencies and structural constraints in variable-length sequence models

ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Learning local lexical structure in spontaneous speech language modeling

Learning local lexical structure in spontaneous speech language modeling
Variable-order N-gram generation by word-class splitting and consecutive word grouping

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a stochastic language modeling tool which aims at retrieving variable-length phrases (multigrams), assuming bigram dependencies between them. The phrase retrieval can be intermixed with a phrase clustering procedure, so that the language data are iteratively structured at both a paradigmatic and a syntagmatic level in a fully integrated way. Perplexity results on ATR travel arrangement data with a bi-multigram model (assuming bigram correlations between the phrases) come very close to the trigram scores with a reduced number of entries in the language model. Also the ability of the class version of the model to merge semantically related phrases into a common class is illustrated.