Consensus answers for queries over probabilistic databases
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Ranking distributed probabilistic data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Probabilistic Similarity Search for Uncertain Time Series
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Semantics and evaluation of top-k queries in probabilistic databases
Distributed and Parallel Databases
Context-sensitive document ranking
Proceedings of the 18th ACM conference on Information and knowledge management
A unified approach to ranking in probabilistic databases
Proceedings of the VLDB Endowment
Probabilistic ranking over relations
Proceedings of the 13th International Conference on Extending Database Technology
Consistent query answers in inconsistent probabilistic databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Threshold query optimization for uncertain data
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Reverse ranking query over imprecise spatial data
Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application
Identifying interesting instances for probabilistic skylines
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Ranking continuous probabilistic datasets
Proceedings of the VLDB Endowment
Efficient and effective similarity search over probabilistic data based on earth mover's distance
Proceedings of the VLDB Endowment
Building ranked mashups of unstructured sources with uncertain information
Proceedings of the VLDB Endowment
k-nearest neighbors in uncertain graphs
Proceedings of the VLDB Endowment
Similarity search and mining in uncertain databases
Proceedings of the VLDB Endowment
Maintaining consistency of probabilistic databases: a linear programming approach
ER'10 Proceedings of the 29th international conference on Conceptual modeling
Probabilistic inverse ranking queries in uncertain databases
The VLDB Journal — The International Journal on Very Large Data Bases
Ranking queries on uncertain data
The VLDB Journal — The International Journal on Very Large Data Bases
(Approximate) uncertain skylines
Proceedings of the 14th International Conference on Database Theory
Context-sensitive document ranking
Journal of Computer Science and Technology
Annotation based query answer over inconsistent database
Journal of Computer Science and Technology
On probabilistic models for uncertain sequential pattern mining
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Asymptotically efficient algorithms for skyline probabilities of uncertain data
ACM Transactions on Database Systems (TODS)
A unified approach to ranking in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Search computing
Best position algorithms for efficient top-k query processing
Information Systems
Robust ranking of uncertain data
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Handling ER-topk query on uncertain streams
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Lineage for Markovian stream event queries
Proceedings of the 10th ACM International Workshop on Data Engineering for Wireless and Mobile Access
On pruning for top-k ranking in uncertain databases
Proceedings of the VLDB Endowment
Efficient probabilistic reverse nearest neighbor query processing on uncertain data
Proceedings of the VLDB Endowment
Mining sequential patterns from probabilistic databases
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Continuous inverse ranking queries in uncertain streams
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
A truly dynamic data structure for top-k queries on uncertain data
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Continuous probabilistic count queries in wireless sensor networks
SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Getting critical categories of a data set
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Mining sequential patterns from probabilistic databases by pattern-growth
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Attribute and object selection queries on objects with probabilistic attributes
ACM Transactions on Database Systems (TODS)
k-selection query over uncertain data
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Top-k best probability queries on probabilistic data
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
On the semantics of top-k ranking for objects with uncertain data
Computers & Mathematics with Applications
Efficient fuzzy ranking queries in uncertain databases
Applied Intelligence
Finding top k most influential spatial facilities over uncertain objects
Proceedings of the 21st ACM international conference on Information and knowledge management
Range counting coresets for uncertain data
Proceedings of the twenty-ninth annual symposium on Computational geometry
Top-k entities query processing on uncertainly fused multi-sensory data
Personal and Ubiquitous Computing
Top-k best probability queries and semantics ranking properties on probabilistic databases
Data & Knowledge Engineering
Hi-index | 0.00 |
When dealing with massive quantities of data, top-k queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the top-k is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a top-k over deterministic data. Specifically, we define a number of fundamental properties, including exact-k, containment, unique-rank, value-invariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the well-founded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty. For an uncertain relation of N tuples, the processing cost is O(N logN)—no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the top-k has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach.