A comparison of indexing techniques for Japanese text retrieval
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
A survey of multilingual text retrieval
A survey of multilingual text retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing representations in Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text retrieval without using a dictionary
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Overlapping statistical word indexing: a new indexing method for Japanese text
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-language information retrieval with the UMLS metathesaurus
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
The Unicode standard version 3.0
The Unicode standard version 3.0
Cross-Language Information Retrieval
Cross-Language Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Hi-index | 0.00 |
In this paper, we investigate cross language information retrieval (CLIR) for Chinese and Japanese texts utilizing the Han characters - common ideographs used in writing Chinese, Japanese and Korean (CJK) languages. The Unicode encoding scheme, which encodes the superset of Han characters, is used as a common encoding platform to deal with the multilingual collection in a uniform manner. We discuss the importance of Han character semantics in document indexing and retrieval of the ideographic languages. We also analyse the baseline results of the cross language information retrieval using the common Han characters appeared in both Chinese and Japanese texts.