Automatic transliteration for Japanese-to-English text retrieval

Authors:
Yan Qu;Gregory Grefenstette;David A. Evans
Affiliations:
Clairvoyance Corporation, Pittsburgh, PA;Clairvoyance Corporation, Pittsburgh, PA;Clairvoyance Corporation, Pittsburgh, PA
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 9
Cited 16

CLARIT-TREC experiments

TREC-2 Proceedings of the second conference on Text retrieval conference
Empirical methods for artificial intelligence

Empirical methods for artificial intelligence
Query term disambiguation for Web cross-language information retrieval using a search engine

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Cross-Language Information Retrieval

Cross-Language Information Retrieval
Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Dictionary Methods for Cross-Lingual Information Retrieval

DEXA '96 Proceedings of the 7th International Conference on Database and Expert Systems Applications
Machine transliteration

Computational Linguistics
Using mutual information to resolve query translation ambiguities and query term weighting

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Mining the Web to Create a Language Model for Mapping between English Names and Phrases and Japanese

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Translating cross-lingual spelling variants using transformation rules

Information Processing and Management: an International Journal
Multilingual modeling of cross-lingual spelling variants

Information Retrieval
Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Punjabi machine transliteration

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A phonetic similarity model for automatic extraction of transliteration pairs

ACM Transactions on Asian Language Information Processing (TALIP)
A Hybrid Technique for English-Chinese Cross Language Information Retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Hindi Urdu machine transliteration using finite-state transducers

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Modeling impression in probabilistic transliteration into Chinese

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A hybrid model for Urdu Hindi transliteration

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Finite-state scriptural translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Improved transliteration mining using graph reinforcement

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The use of monolingual context vectors for missing translations in cross-language information retrieval

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Analysis of discussion contributions in translated Wikipedia articles

Proceedings of the 4th international conference on Intercultural Collaboration
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)
Transliteration mining using large training and test sets

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

For cross language information retrieval (CLIR) based on bilingual translation dictionaries, good performance depends upon lexical coverage in the dictionary. This is especially true for languages possessing few inter-language cognates, such as between Japanese and English. In this paper, we describe a method for automatically creating and validating candidate Japanese transliterated terms of English words. A phonetic English dictionary and a set of probabilistic mapping rules are used for automatically generating transliteration candidates. A monolingual Japanese corpus is then used for automatically validating the transliterated terms. We evaluate the usage of the extracted English-Japanese transliteration pairs with Japanese to English retrieval experiments over the CLEF bilingual test collections. The use of our automatically derived extension to a bilingual translation dictionary improves average precision, both before and after pseudo-relevance feedback, with gains ranging from 2.5% to 64.8%.