The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval
Journal of the American Society for Information Science
Word association norms, mutual information, and lexicography
Computational Linguistics
Combining multiple evidence from different properties of weighting schemes
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Using n-grams for Korean text retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Phrase processing methods for Japanese text retrieval
ACM SIGIR Forum
Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types
A corpus-based approach to automatic compound extraction
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Noun-phrase analysis in unrestricted text for information retrieval
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Natural language information retrieval: TIPSTER-2 final report
TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996
On the Usefulness of Extracting Syntactic Dependencies for Text Indexing
AICS '02 Proceedings of the 13th Irish International Conference on Artificial Intelligence and Cognitive Science
Lexical and Syntactic knowledge for Information Retrieval
Information Processing and Management: an International Journal
Hi-index | 0.00 |
In this paper, we present a corpus-based learning method that can index diverse types of compound nouns using rules automatically extracted from a large tagged corpus. We develop an efficient way of extracting the compound noun indexing rules automatically and perform extensive experiments to evaluate our indexing rules. The automatic learning method shows about the same performance compared with the manual linguistic approach but is more portable and requires no human efforts. We also evaluate the seven different filtering methods based on both the effectiveness and the efficiency, and present a new method to solve the problems of compound noun over-generation and data sparseness in statistical compound noun processing.