A large scale distributed syntactic, semantic and lexical language model for machine translation

Authors:
Ming Tan;Wenli Zhou;Lei Zheng;Shaojun Wang
Affiliations:
Wright State University, Dayton, OH;Wright State University, Dayton, OH;Wright State University, Dayton, OH;Wright State University, Dayton, OH
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 16
Cited 1

Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Probabilistic top-down parsing and language modeling

Computational Linguistics
Exploiting syntactic structure for language modeling

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Immediate-head parsing for language models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields

ICML '05 Proceedings of the 22nd international conference on Machine learning
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Hierarchical Phrase-Based Translation

Computational Linguistics
Distributed language modeling for N-best list re-ranking

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce
Stochastic analysis of lexical and semantic enhanced structural language model

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications

Large-scale syntactic language modeling with treelets

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content under a directed Markov random field paradigm. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm that has linear time complexity and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and "readability" when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.