Using variable length ngrams for retrieving technical abstracts in Japanese (poster session)

  • Authors:
  • Lin Feng;Kyoji Umemura;Mikio Yamamoto;Kenneth W. Church

  • Affiliations:
  • Toyohashi University of Technology, Department of Information and Computer Sciences, Japan;Toyohashi University of Technology, Department of Information and Computer Sciences, Japan;University of Tsukuba, Institution of Computer Sciences and Electronics, Japan;AT&T labs - Research

  • Venue:
  • IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous studies have reported that bigrams work well for many Asian language including Chinese, Korean and Japanese. Most of these studies have focused on newspaper texts. We report an experiment with a very different genre (technical abstracts) and find performance can be improved by combining both short and long ngrams. It is a sound approach to work with all ngrams of all lengths since we will have more information than that of bigrams.