Term necessity prediction

Authors:
Le Zhao;Jamie Callan
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 17
Cited 5

A probabilistic learning approach for document indexing

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A theory of term weighting based on exploratory data analysis

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Term Weighting in Information Retrieval Using the Term Precision Model

Journal of the ACM (JACM)
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Formal multiple-bernoulli models for language modeling

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of statistical significance tests for information retrieval evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Discovering key concepts in verbose queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generalized inverse document frequency

Proceedings of the 17th ACM conference on Information and knowledge management
An empirical study of gene synonym query expansion in biomedical information retrieval

Information Retrieval
Regression Rank: Learning to Meet the Opportunity of Descriptive Queries

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Reducing long queries using query quality predictors

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Effective and efficient structured retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Query reformulation using anchor text

Proceedings of the third ACM international conference on Web search and data mining

User behavior in zero-recall ecommerce queries

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Automatic query reformulation with syntactic operators to alleviate search difficulty

Proceedings of the 20th ACM international conference on Information and knowledge management
Rewriting null e-commerce queries to recommend products

Proceedings of the 21st international conference companion on World Wide Web
Automatic term mismatch diagnosis for selective query expansion

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Effective and Robust Query-Based Stemming

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The probability that a term appears in relevant documents (P(t | R)) is a fundamental quantity in several probabilistic retrieval models, however it is difficult to estimate without relevance judgments or a relevance model. We call this value term necessity because it measures the percentage of relevant documents retrieved by the term - how necessary a term's occurrence is to document relevance. Prior research typically either set this probability to a constant, or estimated it based on the term's inverse document frequency, neither of which was very effective. This paper identifies several factors that affect term necessity, for example, a term's topic centrality, synonymy and abstractness. It develops term- and query-dependent features for each factor that enable supervised learning of a predictive model of term necessity from training data. Experiments with two popular retrieval models and 6 standard datasets demonstrate that using predicted term necessity estimates as user term weights of the original query terms leads to significant improvements in retrieval accuracy.