Corpus-based learning of compound noun indexing

  • Authors:
  • Byung-Kwan Kwak;Jee-Hyub Kim;Geunbae Lee;Jung Yun Seo

  • Affiliations:
  • Pohang University of Science & Technology (POSTECH);Pohang University of Science & Technology (POSTECH);Pohang University of Science & Technology (POSTECH);Sogang University

  • Venue:
  • RANLPIR '00 Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a corpus-based learning method that can index diverse types of compound nouns using rules automatically extracted from a large tagged corpus. We develop an efficient way of extracting the compound noun indexing rules automatically and perform extensive experiments to evaluate our indexing rules. The automatic learning method shows about the same performance compared with the manual linguistic approach but is more portable and requires no human efforts. We also evaluate the seven different filtering methods based on both the effectiveness and the efficiency, and present a new method to solve the problems of compound noun over-generation and data sparseness in statistical compound noun processing.