Compact query term selection using topically related text

  • Authors:
  • K. Tamsin Maxwell;W. Bruce Croft

  • Affiliations:
  • University of Edinburgh, Edinburgh, United Kingdom;University of Massachusetts, Amherst, Amherst, Massachusetts, USA

  • Venue:
  • Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many recent and highly effective retrieval models for long queries use query reformulation methods that jointly optimize term weights and term selection. These methods learn using word context and global context but typically fail to capture query context. In this paper, we present a novel term ranking algorithm, PhRank, that extends work on Markov chain frameworks for query expansion to select compact and focused terms from within a query itself. This focuses queries so that one to five terms in an unweighted model achieve better retrieval effectiveness than weighted term selection models that use up to 30 terms. PhRank terms are also typically compact and contain 1-2 words compared to competing models that use query subsets up to 7 words long. PhRank captures query context with an affinity graph constructed using word co-occurrence in pseudo-relevant documents. A random walk of the graph is used for term ranking in combination with discrimination weights. Empirical evaluation using newswire and web collections demonstrates that performance of reformulated queries is significantly improved for long queries and at least as good for short, keyword queries compared to highly competitive information retrieval (IR) models.