Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches
Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches
A language modeling framework for resource selection and results merging
Proceedings of the eleventh international conference on Information and knowledge management
Using a Large Monolingual Corpus to Improve Translation Accuracy
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
MARSYAS: a framework for audio analysis
Organised Sound
A hierarchical phrase-based model for statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Clause restructuring for statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improved language modeling for statistical machine translation
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Statistical machine translation
ACM Computing Surveys (CSUR)
A Density-Based Re-ranking Technique for Active Learning for Data Annotations
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
A joint information model for n-best ranking
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Streaming for large scale NLP: language modeling
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
A walk on the other side: adding statistical components to a transfer-based translation system
SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Train the machine with what it can learn: corpus selection for SMT
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Web-based topic language modeling for audio indexing
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Active learning with sampling by uncertainty and density for data annotations
IEEE Transactions on Audio, Speech, and Language Processing
A large scale ranker-based system for search query spelling correction
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A large scale distributed syntactic, semantic and lexical language model for machine translation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
ULISSE: an unsupervised algorithm for detecting reliable dependency parses
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
A scalable distributed syntactic, semantic, and lexical language model
Computational Linguistics
Translation model adaptation for statistical machine translation with monolingual topic information
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
In this paper we describe a novel distributed language model for N-best list re-ranking. The model is based on the client/server paradigm where each server hosts a portion of the data and provides information to the client. This model allows for using an arbitrarily large corpus in a very efficient way. It also provides a natural platform for relevance weighting and selection. We applied this model on a 2.97 billion-word corpus and re-ranked the N-best list from Hiero, a state-of-the-art phrase-based system. Using BLEU as a metric, the re-ranked translation achieves a relative improvement of 4.8%, significantly better than the model-best translation.