Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

Authors:
Graham Cormode;Feifei Li;Ke Yi
Affiliations:
-;-;-
Venue:
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Year:
2009

Citing 0
Cited 47

Consensus answers for queries over probabilistic databases

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Ranking distributed probabilistic data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Probabilistic Similarity Search for Uncertain Time Series

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Semantics and evaluation of top-k queries in probabilistic databases

Distributed and Parallel Databases
Context-sensitive document ranking

Proceedings of the 18th ACM conference on Information and knowledge management
A unified approach to ranking in probabilistic databases

Proceedings of the VLDB Endowment
Probabilistic ranking over relations

Proceedings of the 13th International Conference on Extending Database Technology
Consistent query answers in inconsistent probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Threshold query optimization for uncertain data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Reverse ranking query over imprecise spatial data

Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application
Identifying interesting instances for probabilistic skylines

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Ranking continuous probabilistic datasets

Proceedings of the VLDB Endowment
Efficient and effective similarity search over probabilistic data based on earth mover's distance

Proceedings of the VLDB Endowment
Building ranked mashups of unstructured sources with uncertain information

Proceedings of the VLDB Endowment
k-nearest neighbors in uncertain graphs

Proceedings of the VLDB Endowment
Similarity search and mining in uncertain databases

Proceedings of the VLDB Endowment
Maintaining consistency of probabilistic databases: a linear programming approach

ER'10 Proceedings of the 29th international conference on Conceptual modeling
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Ranking queries on uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
(Approximate) uncertain skylines

Proceedings of the 14th International Conference on Database Theory
Context-sensitive document ranking

Journal of Computer Science and Technology
Annotation based query answer over inconsistent database

Journal of Computer Science and Technology
On probabilistic models for uncertain sequential pattern mining

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Asymptotically efficient algorithms for skyline probabilities of uncertain data

ACM Transactions on Database Systems (TODS)
A unified approach to ranking in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Uncertainty in rank join

Search computing
Best position algorithms for efficient top-k query processing

Information Systems
Robust ranking of uncertain data

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Handling ER-topk query on uncertain streams

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Lineage for Markovian stream event queries

Proceedings of the 10th ACM International Workshop on Data Engineering for Wireless and Mobile Access
On pruning for top-k ranking in uncertain databases

Proceedings of the VLDB Endowment
Efficient probabilistic reverse nearest neighbor query processing on uncertain data

Proceedings of the VLDB Endowment
Mining sequential patterns from probabilistic databases

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Continuous inverse ranking queries in uncertain streams

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
A truly dynamic data structure for top-k queries on uncertain data

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Continuous probabilistic count queries in wireless sensor networks

SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Getting critical categories of a data set

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Mining sequential patterns from probabilistic databases by pattern-growth

BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Attribute and object selection queries on objects with probabilistic attributes

ACM Transactions on Database Systems (TODS)
k-selection query over uncertain data

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Top-k best probability queries on probabilistic data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
On the semantics of top-k ranking for objects with uncertain data

Computers & Mathematics with Applications
Efficient fuzzy ranking queries in uncertain databases

Applied Intelligence
Finding top k most influential spatial facilities over uncertain objects

Proceedings of the 21st ACM international conference on Information and knowledge management
Range counting coresets for uncertain data

Proceedings of the twenty-ninth annual symposium on Computational geometry
Top-k entities query processing on uncertainly fused multi-sensory data

Personal and Ubiquitous Computing
Top-k best probability queries and semantics ranking properties on probabilistic databases

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

When dealing with massive quantities of data, top-k queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the top-k is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a top-k over deterministic data. Specifically, we define a number of fundamental properties, including exact-k, containment, unique-rank, value-invariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the well-founded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty. For an uncertain relation of N tuples, the processing cost is O(N logN)—no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the top-k has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach.