Effective term weighting for sentence retrieval

  • Authors:
  • Saeedeh Momtazi;Matthew Lease;Dietrich Klakow

  • Affiliations:
  • Spoken Language Systems, Saarland University, Germany;School of Information, University of Texas at Austin;Spoken Language Systems, Saarland University, Germany

  • Venue:
  • ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A well-known challenge of information retrieval is how to infer a user's underlying information need when the input query consists of only a few keywords. Question Answering (QA) systems face an equally important but opposite challenge: given a verbose question, how can the system infer the relative importance of terms in order to differentiate the core information need from supporting context? We investigate three simple term-weighting schemes for such estimation within the language modeling retrieval paradigm [6]. While the three schemes described are ad hoc, they address a principled estimation problem underlying the standard word unigram model. We also show these schemes enable better estimation of a state-of-the-art class model based on term clustering [5]. Using a TREC QA dataset, we evaluate the three weighting schemes for both word and class models on the QA subtask of sentence retrieval. Our inverse sentence frequency weighting scheme achieves over 5% absolute improvement in mean-average precision for the standard word model and nearly 2% absolute improvement for the class model.