A probabilistic learning approach for document indexing
ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A theory of term weighting based on exploratory data analysis
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Term Weighting in Information Retrieval Using the Term Precision Model
Journal of the ACM (JACM)
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Formal multiple-bernoulli models for language modeling
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of statistical significance tests for information retrieval evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Discovering key concepts in verbose queries
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generalized inverse document frequency
Proceedings of the 17th ACM conference on Information and knowledge management
An empirical study of gene synonym query expansion in biomedical information retrieval
Information Retrieval
Regression Rank: Learning to Meet the Opportunity of Descriptive Queries
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Reducing long queries using query quality predictors
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Effective and efficient structured retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Query reformulation using anchor text
Proceedings of the third ACM international conference on Web search and data mining
User behavior in zero-recall ecommerce queries
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Automatic query reformulation with syntactic operators to alleviate search difficulty
Proceedings of the 20th ACM international conference on Information and knowledge management
Rewriting null e-commerce queries to recommend products
Proceedings of the 21st international conference companion on World Wide Web
Automatic term mismatch diagnosis for selective query expansion
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Effective and Robust Query-Based Stemming
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
The probability that a term appears in relevant documents (P(t | R)) is a fundamental quantity in several probabilistic retrieval models, however it is difficult to estimate without relevance judgments or a relevance model. We call this value term necessity because it measures the percentage of relevant documents retrieved by the term - how necessary a term's occurrence is to document relevance. Prior research typically either set this probability to a constant, or estimated it based on the term's inverse document frequency, neither of which was very effective. This paper identifies several factors that affect term necessity, for example, a term's topic centrality, synonymy and abstractness. It develops term- and query-dependent features for each factor that enable supervised learning of a predictive model of term necessity from training data. Experiments with two popular retrieval models and 6 standard datasets demonstrate that using predicted term necessity estimates as user term weights of the original query terms leads to significant improvements in retrieval accuracy.