Mining the Web to Create a Language Model for Mapping between English Names and Phrases and Japanese

  • Authors:
  • Gregory Grefenstette;Yan Qu;David A. Evans

  • Affiliations:
  • LIC2M/LIST/CEA, France;Clairvoyance Corporation, Pittsburgh, PA;Clairvoyance Corporation, Pittsburgh, PA

  • Venue:
  • WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web provides the largest, exploitable collection of language use. If we can mine the Web to build abstract models of language use, these models may have many applications. Here we present one example of using the implicit intelligence of language use to solve an important problem for machine translation programs and cross-lingual applications. This problem involves the translation of words written in katakana characters in Japanese. In this paper, we describe techniques of discovering katakana transliteration of English names and of finding English translations of multiword katakana sequences using implicit language models of English and Japanese found on the Web. These techniques were evaluated against human-constructed English-katakana glosses.