Learning from relevant documents in large scale routing retrieval

Authors:
K. L. Kwok;L. Grunfeld
Affiliations:
City University of New York, Flushing, NY;City University of New York, Flushing, NY
Venue:
HLT '94 Proceedings of the workshop on Human Language Technology
Year:
1994

Citing 5
Cited 2

A neural network for probabilistic information retrieval

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments with a component theory of probabilistic information retrieval based on single terms as document components

ACM Transactions on Information Systems (TOIS)
Relevance feedback revisited

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance feedback and inference networks

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval

Generation and Evaluation of Indexes for Chemistry Articles

Journal of Intelligent Information Systems
An automated system that assists in the generation of document indexes

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The normal practice of selecting relevant documents for training routing queries is to either use all relevants or the 'best n' of them after a (retrieval) ranking operation with respect to each query, Using all relevants can introduce noise and ambiguities in training because documents can be long with many irrelevant portions. Using only the 'best n' risks leaving out documents that do not resemble a query. Based on a method of segmenting documents into more uniform size subdocuments, a better approach is to use the top ranked subdocument of every relevant. An alternative selection strategy is based on document properties without ranking. We found experimentally that short relevant documents are the quality items for training. Beginning portions of longer relevants are also useful. Using both types provides a strategy that is effective and efficient.