Experiments in Japanese text retrieval and routing using the NEAT system
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Hybrid term indexing for different IR models
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Lexicon-based orthographic disambiguation in CJK intelligent information retrieval
COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Hi-index | 0.00 |
Orthographic varieties are common in the Japanese language and represent a serious problem for Japanese information retrieval (IR), as IR systems run the risk of missing documents that contain variant forms of the search term. We propose two different strategies for handling orthographic varieties: pronunciation or yomi-based indexing and “Fuzzy Querying”, comparing katakana terms based on edit distance. Both strategies were integrated into our multiple index and fusion system [1] and tested using two different test collections, newspaper articles (Mainichi Shimbun ’98) and scientific abstracts (NTCIR-1), to compare their performance across text genres.