Mining Synonymous Transliterations from the World Wide Web

  • Authors:
  • Chung-Chian Hsu;Chien-Hsing Chen

  • Affiliations:
  • National Yunlin University of Science and Technology;National Yunlin University of Science and Technology

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web has been considered one of the important sources for information. Using search engines to retrieve Web pages can gather lots of information, including foreign information. However, to be better understood by local readers, proper names in a foreign language, such as English, are often transliterated to a local language such as Chinese. Due to different translators and the lack of translation standard, translating foreign proper nouns may result in different transliterations and pose a notorious headache. In particular, it may cause incomplete search results. Using one transliteration as a query keyword will fail to retrieve the Web pages which use a different word as the transliteration. Consequently, important information may be missed. We present a framework for mining synonymous transliterations as many as possible from the Web for a given transliteration. The results can be used to construct a database of synonymous transliterations which can be utilized for query expansion so as to alleviate the incomplete search problem. Experimental results show that the proposed framework can effectively retrieve the set of snippets which may contain synonymous transliterations and then extract the target terms. Most of the extracted synonymous transliterations have higher rank of similarity to the input transliteration compared to other noise terms.