A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean

  • Authors:
  • Jee-Hyub Kim;Byung-Kwan Kwak;Seungwoo Lee;Geunbae Lee;Jong-Hyeok Lee

  • Affiliations:
  • Biological Research Information Center (BRIC), Pohang, South Korea. kjh726@postech.ac.kr;Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea. nerguri@postech.ac.kr;Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea. pinesnow@postech.ac.kr;Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea. gblee@postech.ac.kr;Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea. jhlee@postech.ac.kr

  • Venue:
  • Information Retrieval
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In Korean information retrieval, compound nouns play an important role in improving precision in search experiments. There are two major approaches to compound noun indexing in Korean: statistical and linguistic. Each method, however, has its own shortcomings, such as limitations when indexing diverse types of compound nouns, over-generation of compound nouns, and data sparseness in training. In this paper, we propose a corpus-based learning method, which can index diverse types of compound nouns using rules automatically extracted from a large corpus. The automatic learning method is more portable and requires less human effort, although it exhibits a performance level similar to the manual-linguistic approach. We also present a new filtering method to solve the problems of compound noun over-generation and data sparseness.