Using self-supervised word segmentation in Chinese information retrieval

  • Authors:
  • Fuchun Peng;Xiangji Huang;Dale Schuurmans;Nick Cercone;Stephen E. Robertson

  • Affiliations:
  • University of Waterloo, Waterloo, Canada;University of Waterloo, Waterloo, Canada;University of Waterloo, Waterloo, Canada;University of Waterloo, Waterloo, Canada;Microsoft Research, Cambridge, U.K. and City University, London, U.K.

  • Venue:
  • SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a self-supervised word-segmentation technique for Chinese information retrieval. This method combines the advantages of traditional dictionary based approaches with character based approaches, while overcoming many of their shortcomings. Experiments on TREC data show comparable performance to both the dictionary based and the character based approaches. However, our method is language independent and unsupervised, which provides a promising avenue for constructing accurate multilingual information retrieval systems that are flexible and adaptive.