Mining the Web to Create a Language Model for Mapping between English Names and Phrases and Japanese

Authors:
Gregory Grefenstette;Yan Qu;David A. Evans
Affiliations:
LIC2M/LIST/CEA, France;Clairvoyance Corporation, Pittsburgh, PA;Clairvoyance Corporation, Pittsburgh, PA
Venue:
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2004

Citing 3
Cited 6

A statistical approach to machine translation

Computational Linguistics
Automatic transliteration for Japanese-to-English text retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Machine transliteration

Computational Linguistics

Automatic Acronym Dictionary Construction Based on Acronym Generation Types

IEICE - Transactions on Information and Systems
A comparison of different machine transliteration models

Journal of Artificial Intelligence Research
Web-Based Transliteration of Person Names

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Piggyback: using search engines for robust cross-domain named entity recognition

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Improving machine transliteration performance by using multiple transliteration models

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
The use of monolingual context vectors for missing translations in cross-language information retrieval

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web provides the largest, exploitable collection of language use. If we can mine the Web to build abstract models of language use, these models may have many applications. Here we present one example of using the implicit intelligence of language use to solve an important problem for machine translation programs and cross-lingual applications. This problem involves the translation of words written in katakana characters in Japanese. In this paper, we describe techniques of discovering katakana transliteration of English names and of finding English translations of multiword katakana sequences using implicit language models of English and Japanese found on the Web. These techniques were evaluated against human-constructed English-katakana glosses.