A unified approach to ranking in probabilistic databases

Authors:
Jian Li;Barna Saha;Amol Deshpande
Affiliations:
Computer Science Department, University of Maryland, College Park, USA 20742;Computer Science Department, University of Maryland, College Park, USA 20742;Computer Science Department, University of Maryland, College Park, USA 20742
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2011

Citing 40
Cited 6

A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Comparing top k lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Clean Answers over Dirty Databases: A Probabilistic Approach

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Management of probabilistic data: foundations and challenges

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Data integration with uncertainty

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Event queries on correlated probabilistic streams

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Sliding-window top-k queries on uncertain streams

Proceedings of the VLDB Endowment
Conditioning probabilistic databases

Proceedings of the VLDB Endowment
Efficient search for the top-k probable nearest neighbors in uncertain databases

Proceedings of the VLDB Endowment
Learning to create data-integrating queries

Proceedings of the VLDB Endowment
Evaluating probability threshold k-nearest-neighbor queries over uncertain data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Multiple intents re-ranking

Proceedings of the forty-first annual ACM symposium on Theory of computing
Efficient Processing of Top-k Queries in Uncertain Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Ranking with Uncertain Scores

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Consensus answers for queries over probabilistic databases

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Top-k queries on uncertain data: on score distribution and typical answers

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Indexing correlated probabilistic databases

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
PrDB: managing and exploiting rich correlations in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic nearest-neighbor query on uncertain objects

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Transducing Markov sequences

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximation algorithms for diversified search ranking

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
Ranking continuous probabilistic datasets

Proceedings of the VLDB Endowment
k-selection query over uncertain data

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Models for incomplete and probabilistic information

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology

On pruning for top-k ranking in uncertain databases

Proceedings of the VLDB Endowment
Efficient fuzzy ranking queries in uncertain databases

Applied Intelligence
Finding top k most influential spatial facilities over uncertain objects

Proceedings of the 21st ACM international conference on Information and knowledge management
Applying weighted queries on probabilistic databases

Proceedings of the 21st ACM international conference on Information and knowledge management
A top-k filter for logic-based similarity conditions on probabilistic databases

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Top-k best probability queries and semantics ranking properties on probabilistic databases

Data & Knowledge Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

Ranking is a fundamental operation in data analysis and decision support and plays an even more crucial role if the dataset being explored exhibits uncertainty. This has led to much work in understanding how to rank the tuples in a probabilistic dataset in recent years. In this article, we present a unified approach to ranking and top-k query processing in probabilistic databases by viewing it as a multi-criterion optimization problem and by deriving a set of features that capture the key properties of a probabilistic dataset that dictate the ranked result. We contend that a single, specific ranking function may not suffice for probabilistic databases, and we instead propose two parameterized ranking functions, called PRF 驴 and PRF e, that generalize or can approximate many of the previously proposed ranking functions. We present novel generating functions-based algorithms for efficiently ranking large datasets according to these ranking functions, even if the datasets exhibit complex correlations modeled using probabilistic and/xor trees or Markov networks. We further propose that the parameters of the ranking function be learned from user preferences, and we develop an approach to learn those parameters. Finally, we present a comprehensive experimental study that illustrates the effectiveness of our parameterized ranking functions, especially PRF e, at approximating other ranking functions and the scalability of our proposed algorithms for exact or approximate ranking.