Long distance dependency in language modeling: an empirical study

Authors:
Jianfeng Gao;Hisami Suzuki
Affiliations:
Microsoft Research, Asia, Beijing;Microsoft Research, Redmond, WA
Venue:
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Year:
2004

Citing 9
Cited 2

Inference and Estimation of a Long-Range Trigram Model

Inference and Estimation of a Long-Range Trigram Model
Discovery of linguistic relations using lexical attraction

Discovery of linguistic relations using lexical attraction
Probabilistic top-down parsing and language modeling

Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Immediate-head parsing for language models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Exploring asymmetric clustering for statistical language modeling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised learning of dependency structure for language modeling

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Exploiting headword dependency and predictive clustering for language modeling

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Long distance dependency in language modeling: an empirical study

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Character-level dependencies in Chinese: usefulness and learning

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Long distance dependency in language modeling: an empirical study

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an extensive empirical study on two language modeling techniques, linguistically-motivated word skipping and predictive clustering, both of which are used in capturing long distance word dependencies that are beyond the scope of a word trigram model. We compare the techniques to others that were proposed previously for the same purpose. We evaluate the resulting models on the task of Japanese Kana-Kanji conversion. We show that the two techniques, while simple, outperform existing methods studied in this paper, and lead to language models that perform significantly better than a word trigram model. We also investigate how factors such as training corpus size and genre affect the performance of the models.