Top-K aggregate queries on continuous probabilistic datasets

Authors:
Jianwen Chen;Ling Feng;Jun Zhang
Affiliations:
Dept. of Computer Science & Technology, Tsinghua University, Beijing, China;Dept. of Computer Science & Technology, Tsinghua University, Beijing, China;Wuhan City, Hubei Prov., China
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 14
Cited 0

Notes on the Adaptive Simpson Quadrature Routine

Journal of the ACM (JACM)
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic ranked queries in uncertain databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Probabilistic top-k and ranking-aggregate queries

ACM Transactions on Database Systems (TODS)
Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations

IEEE Transactions on Knowledge and Data Engineering
Ranking with Uncertain Scores

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Top-k queries on uncertain data: on score distribution and typical answers

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Ranking continuous probabilistic datasets

Proceedings of the VLDB Endowment
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
On pruning for top-k ranking in uncertain databases

Proceedings of the VLDB Endowment
Semantics of Ranking Queries for Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Top-K aggregate query, which ranks groups of tuples by their aggregate values and returns the K groups with the highest aggregates, is a crucial requirement in many domains such as information extraction, data integration, and sensor data processing. In this paper, we formulate the top-K aggregate queries when the tuple scores are presented as continuous probability distributions. Algorithms for top-K aggregate queries are presented. To further improve the performance, we develop pruning techniques and adaptive strategy that avoid computing the exact aggregate values of some groups that are guaranteed not to be in top-K. Our experimental study shows the efficiency of our techniques over several datasets with continuous attribute uncertainty.