Chinese-Japanese cross language information retrieval: a Han character based approach

  • Authors:
  • Md Maruf Hasan;Yuji Matsumoto

  • Affiliations:
  • Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan

  • Venue:
  • WWSM '00 Proceedings of the ACL-2000 workshop on Word senses and multi-linguality - Volume 8
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we investigate cross language information retrieval (CLIR) for Chinese and Japanese texts utilizing the Han characters - common ideographs used in writing Chinese, Japanese and Korean (CJK) languages. The Unicode encoding scheme, which encodes the superset of Han characters, is used as a common encoding platform to deal with the multilingual collection in a uniform manner. We discuss the importance of Han character semantics in document indexing and retrieval of the ideographic languages. We also analyse the baseline results of the cross language information retrieval using the common Han characters appeared in both Chinese and Japanese texts.