On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
A maximum entropy approach to natural language processing
Computational Linguistics
A systematic comparison of various statistical alignment models
Computational Linguistics
Empirical methods for compound splitting
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Word normalization and decompounding in mono- and bilingual IR
Information Retrieval
High quality word graphs using forward-backward pruning
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Hierarchical Phrase-Based Translation
Computational Linguistics
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Lattice-based minimum error rate training for statistical machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Arabic preprocessing schemes for statistical machine translation
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Unsupervised morphological segmentation with log-linear models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Exploring different representational units in English-to-Turkish statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Towards better machine translation quality for the German--English language pairs
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Optimizing Chinese word segmentation for machine translation performance
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Learning word-class lattices for definition and hypernym extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Paraphrase lattice for statistical machine translation
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Better Arabic parsing: baselines, evaluations, and analysis
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Nonparametric word segmentation for machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Joint tokenization and translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Machine translation with lattices and forests
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Word segmentation for dialect translation
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Word alignment combination over multiple word segmentation
HLT-SS '11 Proceedings of the ACL 2011 Student Session
Translating from morphologically complex languages: a paraphrase-based approach
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language-independent compound splitting with morphological operations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Better hypothesis testing for statistical machine translation: controlling for optimizer instability
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
The CMU-ARK German-English translation system
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Noisy SMS machine translation in low-density languages
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Joint models for Chinese POS tagging and dependency parsing
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hierarchical Bayesian language modelling for the linguistically informed
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A class-based agreement model for generating accurately inflected translations
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Transliteration by sequence labeling with lattice encodings and reranking
NEWS '12 Proceedings of the 4th Named Entity Workshop
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
A Two-Phase Framework for Learning Logical Structures of Paragraphs in Legal Articles
ACM Transactions on Asian Language Information Processing (TALIP)
Generation of compound words in statistical machine translation into compounding languages
Computational Linguistics
Joint Optimization for Chinese POS Tagging and Dependency Parsing
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
Recent work has shown that translating segmentation lattices (lattices that encode alternative ways of breaking the input to an MT system into words), rather than text in any particular segmentation, improves translation quality of languages whose orthography does not mark morpheme boundaries. However, much of this work has relied on multiple segmenters that perform differently on the same input to generate sufficiently diverse source segmentation lattices. In this work, we describe a maximum entropy model of compound word splitting that relies on a few general features that can be used to generate segmentation lattices for most languages with productive compounding. Using a model optimized for German translation, we present results showing significant improvements in translation quality in German-English, Hungarian-English, and Turkish-English translation over state-of-the-art baselines.