Extending weighting models with a term quality measure

Authors:
Christina Lioma;Iadh Ounis
Affiliations:
University of Glasgow, Scotland, U.K.;University of Glasgow, Scotland, U.K.
Venue:
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Year:
2007

Citing 12
Cited 0

Class-based n-gram models of natural language

Computational Linguistics
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Information Retrieval

Information Retrieval
The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A day in the life of web searching: an exploratory study

Information Processing and Management: an International Journal
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Relevance weighting for query independent evidence

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A study of the dirichlet priors for term frequency normalisation

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Multinomial randomness models for retrieval with document fields

ECIR'07 Proceedings of the 29th European conference on IR research
Light syntactically-based index pruning for information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
Combination of document priors in web information retrieval

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Weighting models use lexical statistics, such as term frequencies, to derive term weights, which are used to estimate the relevance of a document to a query. Apart from the removal of stopwords, there is no other consideration of the quality of words that are being 'weighted'. It is often assumed that term frequency is a good indicator for a decision to be made as to how relevant a document is to a query. Our intuition is that raw term frequency could be enhanced to better discriminate between terms. To do so, we propose using non-lexical features to predict the 'quality' of words, before they are weighted for retrieval. Specifically, we show how parts of speech (e.g. nouns, verbs) can help estimate how informative a word generally is, regardless of its relevance to a query/document. Experimental results with two standard TREC collections show that integrating the proposed term quality to two established weighting models enhances retrieval performance, over a baseline that uses the original weighting models, at all times.