A systematic comparison of various statistical alignment models
Computational Linguistics
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
A comparison of alignment models for statistical machine translation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Discriminative training and maximum entropy models for statistical machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Scaling phrase-based statistical machine translation to larger corpora and longer phrases
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
MTTK: an alignment toolkit for statistical machine translation
NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Machine translation by pattern matching
Machine translation by pattern matching
Exploring large-data issues in the curriculum: a case study with MapReduce
TeachCL '08 Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics
Tera-scale translation models via pattern matching
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Evaluating SPLASH-2 Applications Using MapReduce
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Exploring large-data issues in the curriculum: a case study with MapReduce
TeachCL '08 Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics
Design patterns for efficient graph algorithms in MapReduce
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Distributed asynchronous online learning for natural language processing
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Rapid parallel genome indexing with MapReduce
Proceedings of the second international workshop on MapReduce and its applications
A distributed look-up architecture for text mining applications using MapReduce
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce
Proceedings of the 21st international conference on World Wide Web
Large-scale machine learning at twitter
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Breaking the MapReduce stage barrier
Cluster Computing
Scaling big data mining infrastructure: the twitter experience
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
In recent years, the quantity of parallel training data available for statistical machine translation has increased far more rapidly than the performance of individual computers, resulting in a potentially serious impediment to progress. Parallelization of the model-building algorithms that process this data on computer clusters is fraught with challenges such as synchronization, data exchange, and fault tolerance. However, the MapReduce programming paradigm has recently emerged as one solution to these issues: a powerful functional abstraction hides system-level details from the researcher, allowing programs to be transparently distributed across potentially very large clusters of commodity hardware. We describe MapReduce implementations of two algorithms used to estimate the parameters for two word alignment models and one phrase-based translation model, all of which rely on maximum likelihood probability estimates. On a 20-machine cluster, experimental results show that our solutions exhibit good scaling characteristics compared to a hypothetical, optimally-parallelized version of current state-of-the-art single-core tools.