Semantics of Ranking Queries for Probabilistic Data

Authors:
Jeffrey Jestes;Graham Cormode;Feifei Li;Ke Yi
Affiliations:
Florida State University, Tallahassee;AT&T Labs-Research, Florham Park;Florida State University, Tallahassee;Hong Kong University of Science and Technology, Hong Kong
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2011

Citing 0
Cited 5

A truly dynamic data structure for top-k queries on uncertain data

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Efficient processing of probabilistic set-containment queries on uncertain set-valued data

Information Sciences: an International Journal
MUD: Mapping-based query processing for high-dimensional uncertain data

Information Sciences: an International Journal
Efficient pruning algorithm for top-K ranking on dataset with value uncertainty

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Top-K aggregate queries on continuous probabilistic datasets

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, there have been several attempts to propose definitions and algorithms for ranking queries on probabilistic data. However, these lack many intuitive properties of a top-k over deterministic data. We define several fundamental properties, including exact-k, containment, unique rank, value invariance, and stability, which are satisfied by ranking queries on certain data. We argue that these properties should also be carefully studied in defining ranking queries in probabilistic data, and fulfilled by definition for ranking uncertain data for most applications. We propose an intuitive new ranking definition based on the observation that the ranks of a tuple across all possible worlds represent a well-founded rank distribution. We studied the ranking definitions based on the expectation, the median, and other statistics of this rank distribution for a tuple and derived the expected rank, median rank, and quantile rank correspondingly. We are able to prove that the expected rank, median rank, and quantile rank satisfy all these properties for a ranking query. We provide efficient solutions to compute such rankings across the major models of uncertain data, such as attribute-level and tuple-level uncertainty. Finally, a comprehensive experimental study confirms the effectiveness of our approach.