Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Probabilistic top-down parsing and language modeling
Computational Linguistics
Exploiting syntactic structure for language modeling
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Immediate-head parsing for language models
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A syntax-based statistical translation model
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
ICML '05 Proceedings of the 22nd international conference on Machine learning
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
A hierarchical phrase-based model for statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Hierarchical Phrase-Based Translation
Computational Linguistics
Distributed language modeling for N-best list re-ranking
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
Stochastic analysis of lexical and semantic enhanced structural language model
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Large-scale syntactic language modeling with treelets
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content under a directed Markov random field paradigm. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm that has linear time complexity and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and "readability" when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.