Distributed language modeling for N-best list re-ranking

Authors:
Ying Zhang;Almut Silja Hildebrand;Stephan Vogel
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Year:
2006

Citing 10
Cited 15

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches

Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
Using a Large Monolingual Corpus to Improve Translation Accuracy

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
MARSYAS: a framework for audio analysis

Organised Sound
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improved language modeling for statistical machine translation

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Statistical machine translation

ACM Computing Surveys (CSUR)
A Density-Based Re-ranking Technique for Active Learning for Data Annotations

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
A joint information model for n-best ranking

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Streaming for large scale NLP: language modeling

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A scalable decoder for parsing-based machine translation with equivalent language model state maintenance

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
A walk on the other side: adding statistical components to a transfer-based translation system

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Train the machine with what it can learn: corpus selection for SMT

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Web-based topic language modeling for audio indexing

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Active learning with sampling by uncertainty and density for data annotations

IEEE Transactions on Audio, Speech, and Language Processing
A large scale ranker-based system for search query spelling correction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A large scale distributed syntactic, semantic and lexical language model for machine translation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Enhancing language models in statistical machine translation with backward n-grams and mutual information triggers

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
ULISSE: an unsupervised algorithm for detecting reliable dependency parses

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
A scalable distributed syntactic, semantic, and lexical language model

Computational Linguistics
Translation model adaptation for statistical machine translation with monolingual topic information

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe a novel distributed language model for N-best list re-ranking. The model is based on the client/server paradigm where each server hosts a portion of the data and provides information to the client. This model allows for using an arbitrarily large corpus in a very efficient way. It also provides a natural platform for relevance weighting and selection. We applied this model on a 2.97 billion-word corpus and re-ranked the N-best list from Hiero, a state-of-the-art phrase-based system. Using BLEU as a metric, the re-ranked translation achieves a relative improvement of 4.8%, significantly better than the model-best translation.