Position Models and Language Modeling

Authors:
Arnaud Zdziobeck;Franck Thollard
Affiliations:
Laboratoire Hubert Curien, UMR CNRS 5516, Université de Lyon, Université Jean Monnet, Saint-Étienne,;Laboratoire Hubert Curien, UMR CNRS 5516, Université de Lyon, Université Jean Monnet, Saint-Étienne,
Venue:
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Year:
2008

Citing 16
Cited 0

On the learnability and usage of acyclic probabilistic finite automata

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Building probabilistic models for natural language

Building probabilistic models for natural language
Smoothing Probabilistic Automata: An Error-Correcting Approach

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
Stochastic Grammatical Inference with Multinomial Tests

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Shallow Parsing Using Probabilistic Grammatical Inference

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning Stochastic Regular Grammars by Means of a State Merging Method

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Improving Probabilistic Grammatical Inference Core Algorithms with Post-processing Techniques

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Finite Automata for Compact Representation of Language Models in NLP

CIAA '01 Revised Papers from the 6th International Conference on Implementation and Application of Automata
On the Convergence Rate of Good-Turing Estimators

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Probabilistic Finite-State Machines-Part II

IEEE Transactions on Pattern Analysis and Machine Intelligence
Immediate-head parsing for language models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Multi-site data collection for a spoken language corpus

HLT '91 Proceedings of the workshop on Speech and Natural Language
Learning Partially Observable Markov Models from First Passage Times

ECML '07 Proceedings of the 18th European conference on Machine Learning
On Growing and Pruning Kneser–Ney Smoothed -Gram Models

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In statistical language modelling the classic model used is n -gram. This model is not able however to capture long term dependencies, i.e. dependencies larger than n . An alternative to this model is the probabilistic automaton. Unfortunately, it appears that preliminary experiments on the use of this model in language modelling is not yet competitive, partly because it tries to model too long term dependencies. We propose here to improve the use of this model by restricting the dependency to a more reasonable value. Experiments shows an improvement of 45% reduction in the perplexity obtained on the Wall Street Journal language modeling task.