Discriminative substring decoding for transliteration

Authors:
Colin Cherry;Hisami Suzuki
Affiliations:
Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Year:
2009

Citing 16
Cited 5

Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine transliteration

Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Machine transliteration of names in Arabic text

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Dependency treelet translation: syntactically informed phrasal SMT

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative syntactic language modeling for speech recognition

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An end-to-end discriminative approach to machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A hybrid back-transliteration system for Japanese

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Named entity transliteration and discovery from multilingual comparable corpora

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Competitive generative models with structure learning for NLP classification tasks

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Discriminative methods for transliteration

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Latent-variable modeling of string transductions with finite-state methods

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Predicting word pronunciation in Japanese

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
MSR SPLAT, a language analysis toolkit

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session
Multilingual named entity recognition using parallel data and metadata from Wikipedia

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A unified approach to transliteration-based text input with online spelling correction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Applying mpaligner to machine transliteration with Japanese-specific heuristics

NEWS '12 Proceedings of the 4th Named Entity Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a discriminative substring decoder for transliteration. This decoder extends recent approaches for discriminative character transduction by allowing for a list of known target-language words, an important resource for transliteration. Our approach improves upon Sherif and Kondrak's (2007b) state-of-the-art decoder, creating a 28.5% relative improvement in transliteration accuracy on a Japanese katakana-to-English task. We also conduct a controlled comparison of two feature paradigms for discriminative training: indicators and hybrid generative features. Surprisingly, the generative hybrid outperforms its purely discriminative counterpart, despite losing access to rich source-context features. Finally, we show that machine transliterations have a positive impact on machine translation quality, improving human judgments by 0.5 on a 4-point scale.