Ad-hoc top-k query answering for data streams

  • Authors:
  • Gautam Das;Dimitrios Gunopulos;Nick Koudas;Nikos Sarkas

  • Affiliations:
  • University of Texas at Arlington;University of California, Riverside;University of Toronto;University of Toronto

  • Venue:
  • VLDB '07 Proceedings of the 33rd international conference on Very large data bases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A top-k query retrieves the k highest scoring tuples from a data set with respect to a scoring function defined on the attributes of a tuple. The efficient evaluation of top-k queries has been an active research topic and many different instantiations of the problem, in a variety of settings, have been studied. However, techniques developed for conventional, centralized or distributed databases are not directly applicable to highly dynamic environments and on-line applications, like data streams. Recently, techniques supporting top-k queries on data streams have been introduced. Such techniques are restrictive however, as they can only efficiently report top-k answers with respect to a pre-specified (as opposed to ad-hoc) set of queries. In this paper we introduce a novel geometric representation for the top-k query problem that allows us to raise this restriction. Utilizing notions of geometric arrangements, we design and analyze algorithms for incrementally maintaining a data set organized in an arrangement representation under streaming updates. We introduce query evaluation strategies that operate on top of an arrangement data structure that are able to guarantee efficient evaluation for ad-hoc queries. The performance of our core technique is augmented by incorporating tuple pruning strategies, minimizing the number of tuples that need to be stored and manipulated. This results in a main memory indexing technique supporting both efficient incremental updates and the evaluation of ad-hoc top-k queries. A thorough experimental study evaluates the efficiency of the proposed technique.