A truly dynamic data structure for top-k queries on uncertain data

Authors:
Manish Patil;Rahul Shah;Sharma V. Thankachan
Affiliations:
Computer Science Department, Louisiana State University, Baton Rouge, LA;Computer Science Department, Louisiana State University, Baton Rouge, LA;Computer Science Department, Louisiana State University, Baton Rouge, LA
Venue:
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Year:
2011

Citing 20
Cited 0

Comparing top k lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Declarative Data Cleaning: Language, Model, and Algorithms

Proceedings of the 27th International Conference on Very Large Data Bases
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Approximate quantiles and the order of the stream

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data integration: the teenage years

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Model-driven data acquisition in sensor networks

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Tight lower bounds for selection in randomly ordered streams

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Robust lower bounds for communication and stream computation

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sliding-window top-k queries on uncertain streams

Proceedings of the VLDB Endowment
Conditioning probabilistic databases

Proceedings of the VLDB Endowment
Efficient Processing of Top-k Queries in Uncertain Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
MayBMS: a probabilistic database management system

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
PrDB: managing and exploiting rich correlations in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
A unified approach to ranking in probabilistic databases

Proceedings of the VLDB Endowment
Dynamic structures for top-k queries on uncertain data

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Semantics of Ranking Queries for Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

Top-k queries allow end-users to focus on the most important (top-k) (answers amongst those which satisfy the query. In traditional databases, a user defined score function assigns a score value to each tuple and a top-k query returns k tuples with the highest score. In uncertain database, top-k answer depends not only on the scores but also on the membership probabilities of tuples. Several top-k definitions covering different aspects of score-probability interplay have been proposed in recent past [20, 13, 6, 18]. Most of the existing work in this research field is focused on developing efficient algorithms for answering top-k queries on static uncertain data. Any change (insertion, deletion of a tuple or change in membership probability, score of a tuple) in underlying data forces re-computation of query answers. Such re-computations are not practical considering the dynamic nature of data in many applications. In this paper, we propose a truly dynamic data structure that uses ranking function PRFe(α) proposed by Li et al. [18] under the generally adopted model of x-relations [21]. PRFe can effectively approximate various other top-k definitions on uncertain data based on the value of parameter α. An x-relation consists of a number of x-tuples, where x-tuple is a set of mutually exclusive tuples (up to a constant number) called alternatives. Each x-tuple in a relation randomly instantiates into one tuple from its alternatives. For an uncertain relation with N tuples, our structure can answer top-k queries in O(k log N) time, handles an update in O(log N) time and takes O(N) space. Finally, we evaluate practical efficiency of our structure on both synthetic and real data.