Top-k diversity queries over bounded regions

  • Authors:
  • Ilio Catallo;Eleonora Ciceri;Piero Fraternali;Davide Martinenghi;Marco Tagliasacchi

  • Affiliations:
  • Politecnico di Milano;Politecnico di Milano;Politecnico di Milano;Politecnico di Milano;Politecnico di Milano

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Top-k diversity queries over objects embedded in a low-dimensional vector space aim to retrieve the best k objects that are both relevant to given user's criteria and well distributed over a designated region. An interesting case is provided by spatial Web objects, which are produced in great quantity by location-based services that let users attach content to places and are found also in domains like trip planning, news analysis, and real estate. In this article we present a technique for addressing such queries that, unlike existing methods for diversified top-k queries, does not require accessing and scanning all relevant objects in order to find the best k results. Our Space Partitioning and Probing (SPP) algorithm works by progressively exploring the vector space, while keeping track of the already seen objects and of their relevance and position. The goal is to provide a good quality result set in terms of both relevance and diversity. We assess quality by using as a baseline the result set computed by MMR, one of the most popular diversification algorithms, while minimizing the number of accessed objects. In order to do so, SPP exploits score-based and distance-based access methods, which are available, for instance, in most geo-referenced Web data sources. Experiments with both synthetic and real data show that SPP produces results that are relevant and spatially well distributed, while significantly reducing the number of accessed objects and incurring a very low computational overhead.