Modelling efficient novelty-based search result diversification in metric spaces

  • Authors:
  • Veronica Gil-Costa;Rodrygo L. T. Santos;Craig Macdonald;Iadh Ounis

  • Affiliations:
  • Yahoo! Research, Santiago de Chile, Chile and CONICET, Argentina;University of Glasgow, UK;University of Glasgow, UK;University of Glasgow, UK

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Novelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n^2) document-document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by document-document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the document@?s relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency.