Using perfect sampling in parameter estimation of a whole sentence maximum entropy language model

Authors:
F. Amaya;J. M. Benedí
Affiliations:
Universidad Politécnica de Valencia, Valencia, Spain;Universidad Politécnica de Valencia, Valencia, Spain
Venue:
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Year:
2000

Citing 5
Cited 1

A maximum entropy approach to natural language processing

Computational Linguistics
Exact sampling with coupled Markov chains and applications to statistical mechanics

Proceedings of the seventh international conference on Random structures and algorithms
Inducing Features of Random Fields

Inducing Features of Random Fields
Combination of n-grams and Stochastic Context-Free Grammars for language modeling

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Efficient sampling and feature selection in whole sentence maximum entropy language models

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Improvement of a Whole Sentence Maximum Entropy Language Model using grammatical features

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Maximum Entropy principle (ME) is an appropriate framework for combining information of a diverse nature from several sources into the same language model. In order to incorporate long-distance information into the ME framework in a language model, a Whole Sentence Maximum Entropy Language Model (WSME) could be used. Until now MonteCarlo Markov Chains (MCMC) sampling techniques has been used to estimate the parameters of the WSME model. In this paper, we propose the application of another sampling technique: the Perfect Sampling (PS). The experiment has shown a reduction of 30% in the perplexity of the WSME model over the trigram model and a reduction of 2% over the WSME model trained with MCMC.