Utilizing passage-based language models for ad hoc document retrieval

  • Authors:
  • Michael Bendersky;Oren Kurland

  • Affiliations:
  • Department of Computer Science, Center for Intelligent Information Retrieval, University of Massachusetts Amherst, Amherst, USA;Faculty of Industrial Engineering and Management, Technion--Israel Institute of Technology, Haifa, Israel

  • Venue:
  • Information Retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

To cope with the fact that, in the ad hoc retrieval setting, documents relevant to a query could contain very few (short) parts (passages) with query-related information, researchers proposed passage-based document ranking approaches. We show that several of these retrieval methods can be understood, and new ones can be derived, using the same probabilistic model. We use language-model estimates to instantiate specific retrieval algorithms, and in doing so present a novel passage language model that integrates information from the containing document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we present yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; these relevance models also outperform a document-based relevance model. Finally, we demonstrate the merits in using the document-homogeneity measures for integrating document-query and passage-query similarity information for document retrieval.