Evaluating text representations for retrieval of the best group of documents

  • Authors:
  • Xiaoyong Liu;W. Bruce Croft

  • Affiliations:
  • CIIR, Computer Science Department, University of Massachusetts, Amherst, MA;CIIR, Computer Science Department, University of Massachusetts, Amherst, MA

  • Venue:
  • ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cluster retrieval assumes that the probability of relevance of a document should depend on the relevance of other similar documents to the same query. The goal is to find the best group of documents. Many studies have examined the effectiveness of this approach, by employing different retrieval methods or clustering algorithms, but few have investigated text representations. This paper revisits the problem of retrieving the best group of documents, from the language-modeling perspective. We analyze the advantages and disadvantages of a range of representation techniques, derive features that characterize the good document groups, and experiment with a new probabilistic representation as a first step toward incorporating these features. Empirical evaluation demonstrates that the relationship between documents can be leveraged in retrieval when a good representation technique is available, and that retrieving the best group of documents can be more effective than retrieving individual documents.