Selecting effective index terms using a decision tree

Authors:
Tokunaga Takenobu;Kimura Kenji;Ogibayashi Hironori;Tanaka Hozumi
Affiliations:
Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan;Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan;Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan;Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan
Venue:
Natural Language Engineering
Year:
2002

Citing 8
Cited 2

Automatic text processing

Automatic text processing
The constituent object parser: syntactic structure matching for information retrieval

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
C4.5: programs for machine learning

C4.5: programs for machine learning
Natural language information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
Improving two-stage ad-hoc retrieval for short queries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Phase-based information retrieval

Information Processing and Management: an International Journal
Information Retrieval

Information Retrieval
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II

Paraphrasing Japanese noun phrases using character-based indexing

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Attribute and object selection queries on objects with probabilistic attributes

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores the effectiveness of index terms more complex than the single words used in conventional information retrieval systems. Retrieval is done in two phases: in the first, a conventional retrieval method (the Okapi system) is used; in the second, complex index terms such as syntactic relations and single words with part-of-speech information are introduced to rerank the results of the first phase. We evaluated the effectiveness of the different types of index terms through experiments using the TREC-7 test collection and 50 queries. The retrieval effectiveness was improved for 32 out of 50 queries. Based on this investigation, we then introduce a method to select effective index terms by using a decision tree. Further experiments with the same test collection showed that retrieval effectiveness was improved in 25 of the 50 queries.