Top-K aggregate queries on continuous probabilistic datasets

  • Authors:
  • Jianwen Chen;Ling Feng;Jun Zhang

  • Affiliations:
  • Dept. of Computer Science & Technology, Tsinghua University, Beijing, China;Dept. of Computer Science & Technology, Tsinghua University, Beijing, China;Wuhan City, Hubei Prov., China

  • Venue:
  • WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Top-K aggregate query, which ranks groups of tuples by their aggregate values and returns the K groups with the highest aggregates, is a crucial requirement in many domains such as information extraction, data integration, and sensor data processing. In this paper, we formulate the top-K aggregate queries when the tuple scores are presented as continuous probability distributions. Algorithms for top-K aggregate queries are presented. To further improve the performance, we develop pruning techniques and adaptive strategy that avoid computing the exact aggregate values of some groups that are guaranteed not to be in top-K. Our experimental study shows the efficiency of our techniques over several datasets with continuous attribute uncertainty.