Computational Linguistics
Hi-index | 0.00 |
In Japanese, it is quite common for the same word to be written in several different ways. This is especially true for katakana words which are typically used for transliterating foreign languages. This ambiguity becomes critical for automatic processing such as information retrieval (IR). To tackle this problem, we propose a simple but effective approach to generating katakana variants by considering phonemic representation of the original language for a given word. The proposed approach is evaluated through an assessment of the variants it generates. Also, the impact of the generated variants on IR is studied in comparison to an existing approach using katakana rewriting rules.