Generating diverse katakana variants based on phonemic mapping

  • Authors:
  • Kazuhiro Seki;Hiroyuki Hattori;Kuniaki Uehara

  • Affiliations:
  • Kobe University, Kobe, Japan;Google Inc., Shibuya, Japan;Kobe University, Kobe, Japan

  • Venue:
  • Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In Japanese, it is quite common for the same word to be written in several different ways. This is especially true for katakana words which are typically used for transliterating foreign languages. This ambiguity becomes critical for automatic processing such as information retrieval (IR). To tackle this problem, we propose a simple but effective approach to generating katakana variants by considering phonemic representation of the original language for a given word. The proposed approach is evaluated through an assessment of the variants it generates. Also, the impact of the generated variants on IR is studied in comparison to an existing approach using katakana rewriting rules.