Effective term weighting for sentence retrieval

Authors:
Saeedeh Momtazi;Matthew Lease;Dietrich Klakow
Affiliations:
Spoken Language Systems, Saarland University, Germany;School of Information, University of Texas at Austin;Spoken Language Systems, Saarland University, Germany
Venue:
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Year:
2010

Citing 7
Cited 0

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval and novelty detection at the sentence level

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Lightening the load of document smoothing for better language modeling retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of sentence retrieval techniques

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Regression Rank: Learning to Meet the Opportunity of Descriptive Queries

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A word clustering approach for language model-based sentence retrieval in question answering systems

Proceedings of the 18th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A well-known challenge of information retrieval is how to infer a user's underlying information need when the input query consists of only a few keywords. Question Answering (QA) systems face an equally important but opposite challenge: given a verbose question, how can the system infer the relative importance of terms in order to differentiate the core information need from supporting context? We investigate three simple term-weighting schemes for such estimation within the language modeling retrieval paradigm [6]. While the three schemes described are ad hoc, they address a principled estimation problem underlying the standard word unigram model. We also show these schemes enable better estimation of a state-of-the-art class model based on term clustering [5]. Using a TREC QA dataset, we evaluate the three weighting schemes for both word and class models on the QA subtask of sentence retrieval. Our inverse sentence frequency weighting scheme achieves over 5% absolute improvement in mean-average precision for the standard word model and nearly 2% absolute improvement for the class model.