A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean

Authors:
Jee-Hyub Kim;Byung-Kwan Kwak;Seungwoo Lee;Geunbae Lee;Jong-Hyeok Lee
Affiliations:
Biological Research Information Center (BRIC), Pohang, South Korea. kjh726@postech.ac.kr;Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea. nerguri@postech.ac.kr;Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea. pinesnow@postech.ac.kr;Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea. gblee@postech.ac.kr;Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea. jhlee@postech.ac.kr
Venue:
Information Retrieval
Year:
2001

Citing 10
Cited 3

The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval

Journal of the American Society for Information Science
Word association norms, mutual information, and lexicography

Computational Linguistics
Combining multiple evidence from different properties of weighting schemes

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Phrase processing methods for Japanese text retrieval

ACM SIGIR Forum
Information Retrieval

Information Retrieval
Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types

Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types
Fast statistical parsing of noun phrases for document indexing

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A corpus-based approach to automatic compound extraction

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Noun-phrase analysis in unrestricted text for information retrieval

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Natural language information retrieval: TIPSTER-2 final report

TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996

Exploring term dependences in probabilistic information retrieval model

Information Processing and Management: an International Journal
Named entity tagging for korean using DL-CoTrain algorithm

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Learning information extraction rules for protein annotation from unannotated corpora

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Korean information retrieval, compound nouns play an important role in improving precision in search experiments. There are two major approaches to compound noun indexing in Korean: statistical and linguistic. Each method, however, has its own shortcomings, such as limitations when indexing diverse types of compound nouns, over-generation of compound nouns, and data sparseness in training. In this paper, we propose a corpus-based learning method, which can index diverse types of compound nouns using rules automatically extracted from a large corpus. The automatic learning method is more portable and requires less human effort, although it exhibits a performance level similar to the manual-linguistic approach. We also present a new filtering method to solve the problems of compound noun over-generation and data sparseness.