Robust ranking of uncertain data

Authors:
Da Yan;Wilfred Ng
Affiliations:
The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong;The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Venue:
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Year:
2011

Citing 11
Cited 2

Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Comparing top k lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
From complete to incomplete information and back

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Sliding-window top-k queries on uncertain streams

Proceedings of the VLDB Endowment
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Top-k queries on uncertain data: on score distribution and typical answers

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A unified approach to ranking in probabilistic databases

Proceedings of the VLDB Endowment

Top-k best probability queries on probabilistic data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Top-k best probability queries and semantics ranking properties on probabilistic databases

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Numerous real-life applications are continually generating huge amounts of uncertain data (e.g., sensor or RFID readings). As a result, top-k queries that return only the k most promising probabilistic tuples become an important means to monitor and analyze such data. These "top" tuples should have both high scores in term of some ranking function, and high occurrence probability. The previous works on ranking semantics are not entirely satisfactory in the following sense: they either require user-specified parameters other than k, or cannot be evaluated efficiently in real-time scale, or even generating results violating the underlying probability model. In order to overcome all these deficiencies, we propose a new semantics called U-Popk based on a simpler but more fundamental property inherent in the underlying probability model. We then develop an efficient algorithm to evaluate U-Popk. Extensive experiments confirm that U-Popk is able to ensure high ranking quality and to support efficient evaluation of top-k queries on probabilistic tuples.