Latent class transliteration based on source language origin

Authors:
Masato Hagiwara;Satoshi Sekine
Affiliations:
Rakuten Institute of Technology, New York, Park Avenue South, New York, NY;Rakuten Institute of Technology, New York, Park Avenue South, New York, NY
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 4
Cited 3

Machine transliteration

Computational Linguistics
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Learning a spelling error model from search query logs

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Japanese query alteration based on semantic similarity

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

G2P conversion of proper names using word origin information

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Latent semantic transliteration using dirichlet mixture

NEWS '12 Proceedings of the 4th Named Entity Workshop
Applying mpaligner to machine transliteration with Japanese-specific heuristics

NEWS '12 Proceedings of the 4th Named Entity Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transliteration, a rich source of proper noun spelling variations, is usually recognized by phonetic- or spelling-based models. However, a single model cannot deal with different words from different language origins, e.g., "get" in "piaget" and "target." Li et al. (2007) propose a method which explicitly models and classifies the source language origins and switches transliteration models accordingly. This model, however, requires an explicitly tagged training set with language origins. We propose a novel method which models language origins as latent classes. The parameters are learned from a set of transliterated word pairs via the EM algorithm. The experimental results of the transliteration task of Western names to Japanese show that the proposed model can achieve higher accuracy compared to the conventional models without latent classes.