A probabilistic approach to compound noun indexing in Korean texts

  • Authors:
  • Hyouk R. Park;Young S. Han;Kang H. Lee;Key-Sun Choi

  • Affiliations:
  • Korea R&D Information Center/KIST, YuSong Taejon, Korea;Korea R&D Information Center/KIST, YuSong Taejon, Korea;Korea R&D Information Center/KIST, YuSong Taejon, Korea;Computer Science Department KAIST, YuSong Taejon, Korea

  • Venue:
  • COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we address the problem of compound noun indexing that is about segmenting or decomposing compound nouns into promising index terms. Compound nouns as index terms that usually subscribe to specific notions tend to increase the precision of retrieval performance. The use of the component nouns of a compound noun as index terms, on the other hand, may improve the recall performance, but can decrease the precision.Our proposed method to handle compound nouns with a goal to increase the recall while preserving the precision computes the relevance of the component nouns of a compound noun to the document content by comparing the document sets that are supported by the component nouns and the terms of the document. The operational content of a term is represented as the probabilistic distribution of the term over the document set.Experiments with a set of 1,000 documents show that our method gains 33% increase of retrieval performance compared to the indexing method without compound noun analysis, and is as good as manual decomposition by human experts.