PM-based indexing for Chinese text retrieval

  • Authors:
  • Du Lin;Zhang Yibo;Sun Le;Sun Yufang;Han Jie

  • Affiliations:
  • Institute of Software, Chinese Academy of Sciences, Beijing, P.R.China;Institute of Software, Chinese Academy of Sciences, Beijing, P.R.China;Institute of Software, Chinese Academy of Sciences, Beijing, P.R.China;Institute of Software, Chinese Academy of Sciences, Beijing, P.R.China;Institute of Software, Chinese Academy of Sciences, Beijing, P.R.China

  • Venue:
  • IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper focused on introducing a novel PM indexing schema for Chinese text retrieval. Different with the Western languages, there is no delimiter between words in Chinese texts. The indexing is based either on the characters or on the segmented words. For the word-based indexing, the out-of-vocabulary words, such as the proper nouns, or domain terminology, are usually mis-segmented due to the limited vocabulary coverage of the segmentation dictionaries and thus impair the query precision. In this paper, several indexing and ranking methods, including the novel PM-based ranking, were tested so as to compare their efficiency in dealing with the new words in Chinese text retrieval. The experiment has shown that the query precision of the PM + word method is 10% higher than the word indexing.