Distributed top-k query processing by exploiting skyline summaries

  • Authors:
  • Akrivi Vlachou;Christos Doulkeridis;Kjetil Nørvåg

  • Affiliations:
  • Dept. of Computer Science, NTNU, Trondheim, Norway;Dept. of Computer Science, NTNU, Trondheim, Norway;Dept. of Computer Science, NTNU, Trondheim, Norway

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, a trend has been observed towards supporting rank-aware query operators, such as top-k, that enable users to retrieve only a limited set of the most interesting data objects. As data nowadays is commonly stored distributed over multiple servers, a challenging problem is to support rank-aware queries in distributed environments. In this paper, we propose a novel approach, called DiTo, for efficient top-k processing over multiple servers, where each server stores autonomously a fraction of the data. Towards this goal, we exploit the inherent relationship of top-k and skyline objects, and we employ the skyline objects of servers as a data summarization mechanism for efficiently identifying the servers that store top-k results. Relying on a thresholding scheme, DiTo retrieves the top-k result set progressively, while the number of queried servers and transferred data is minimized. Furthermore, we extend DiTo to support data summarizations of bounded size, thus restricting the cost of summary distribution and maintenance. To this end, we study the challenging problem of finding an abstraction of the skyline set of fixed size that influences the performance of DiTo only slightly. Our experimental evaluation shows that DiTo performs efficiently and provides a viable solution when a high degree of distribution is required.