Class-based n-gram models of natural language
Computational Linguistics
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Information Retrieval
The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A day in the life of web searching: an exploratory study
Information Processing and Management: an International Journal
Simple BM25 extension to multiple weighted fields
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Relevance weighting for query independent evidence
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A study of the dirichlet priors for term frequency normalisation
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Multinomial randomness models for retrieval with document fields
ECIR'07 Proceedings of the 29th European conference on IR research
Light syntactically-based index pruning for information retrieval
ECIR'07 Proceedings of the 29th European conference on IR research
Combination of document priors in web information retrieval
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Hi-index | 0.00 |
Weighting models use lexical statistics, such as term frequencies, to derive term weights, which are used to estimate the relevance of a document to a query. Apart from the removal of stopwords, there is no other consideration of the quality of words that are being 'weighted'. It is often assumed that term frequency is a good indicator for a decision to be made as to how relevant a document is to a query. Our intuition is that raw term frequency could be enhanced to better discriminate between terms. To do so, we propose using non-lexical features to predict the 'quality' of words, before they are weighted for retrieval. Specifically, we show how parts of speech (e.g. nouns, verbs) can help estimate how informative a word generally is, regardless of its relevance to a query/document. Experimental results with two standard TREC collections show that integrating the proposed term quality to two established weighting models enhances retrieval performance, over a baseline that uses the original weighting models, at all times.