The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval
Journal of the American Society for Information Science
Word association norms, mutual information, and lexicography
Computational Linguistics
Combining multiple evidence from different properties of weighting schemes
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Phrase processing methods for Japanese text retrieval
ACM SIGIR Forum
Information Retrieval
Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types
Fast statistical parsing of noun phrases for document indexing
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A corpus-based approach to automatic compound extraction
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Noun-phrase analysis in unrestricted text for information retrieval
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Natural language information retrieval: TIPSTER-2 final report
TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996
Exploring term dependences in probabilistic information retrieval model
Information Processing and Management: an International Journal
Named entity tagging for korean using DL-CoTrain algorithm
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Learning information extraction rules for protein annotation from unannotated corpora
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
In Korean information retrieval, compound nouns play an important role in improving precision in search experiments. There are two major approaches to compound noun indexing in Korean: statistical and linguistic. Each method, however, has its own shortcomings, such as limitations when indexing diverse types of compound nouns, over-generation of compound nouns, and data sparseness in training. In this paper, we propose a corpus-based learning method, which can index diverse types of compound nouns using rules automatically extracted from a large corpus. The automatic learning method is more portable and requires less human effort, although it exhibits a performance level similar to the manual-linguistic approach. We also present a new filtering method to solve the problems of compound noun over-generation and data sparseness.