Chinese-Japanese cross language information retrieval: a Han character based approach

Authors:
Md Maruf Hasan;Yuji Matsumoto
Affiliations:
Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan
Venue:
WWSM '00 Proceedings of the ACL-2000 workshop on Word senses and multi-linguality - Volume 8
Year:
2000

Citing 14
Cited 1

A comparison of indexing techniques for Japanese text retrieval

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
A survey of multilingual text retrieval

A survey of multilingual text retrieval
On Chinese text retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing representations in Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text retrieval without using a dictionary

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Overlapping statistical word indexing: a new indexing method for Japanese text

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-language information retrieval with the UMLS metathesaurus

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
The Unicode standard version 3.0

The Unicode standard version 3.0
Cross-Language Information Retrieval

Cross-Language Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Multicode: A Truly Multilingual Approach to Text Encoding

Computer
Complete swept volume generation, Part I: Swept volume of a piecewise C1-continuous cutter at five-axis milling via Gauss map

Computer-Aided Design

Mixed monolingual homepage finding in 34 languages: the role of language script and search domain

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate cross language information retrieval (CLIR) for Chinese and Japanese texts utilizing the Han characters - common ideographs used in writing Chinese, Japanese and Korean (CJK) languages. The Unicode encoding scheme, which encodes the superset of Han characters, is used as a common encoding platform to deal with the multilingual collection in a uniform manner. We discuss the importance of Han character semantics in document indexing and retrieval of the ideographic languages. We also analyse the baseline results of the cross language information retrieval using the common Han characters appeared in both Chinese and Japanese texts.