Supporting ranking queries on uncertain and incomplete data

Authors:
Mohamed A. Soliman;Ihab F. Ilyas;Shalev Ben-David
Affiliations:
School of Computer Science, University of Waterloo, Waterloo, Canada;School of Computer Science, University of Waterloo, Waterloo, Canada;School of Computer Science, University of Waterloo, Waterloo, Canada
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2010

Citing 27
Cited 6

Statistical analysis with missing data

Statistical analysis with missing data
On the representation and querying of sets of possible worlds

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Transitivity revisited

Annals of Operations Research
Counting linear extensions is #P-complete

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
The Markov chain Monte Carlo method: an approach to approximate counting and integration

Approximation algorithms for NP-hard problems
Faster random generation of linear extensions

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Learning missing values from summary constraints

ACM SIGKDD Explorations Newsletter
Incomplete Relational Database Models Based on Intervals

IEEE Transactions on Knowledge and Data Engineering
Preference formulas in relational queries

ACM Transactions on Database Systems (TODS)
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Multidimensional Integration: Partition and Conquer

Computing in Science and Engineering
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Finding k-dominant skylines in high dimensional space

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Progressive and selective merge: computing top-k with ad-hoc ranking functions

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
How to rank with few errors

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Efficient Skyline and Top-k Retrieval in Subspaces

IEEE Transactions on Knowledge and Data Engineering
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A Bayesian method for guessing the extreme values in a data set?

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Query processing over incomplete autonomous databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
On the semantics and evaluation of top-k queries in probabilistic databases

ICDEW '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop
Consensus answers for queries over probabilistic databases

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A unified approach to ranking in probabilistic databases

Proceedings of the VLDB Endowment

Building ranked mashups of unstructured sources with uncertain information

Proceedings of the VLDB Endowment
Trends in rank join

Search computing
Ranking with uncertain scoring functions: semantics and sensitivity measures

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient processing of probabilistic set-containment queries on uncertain set-valued data

Information Sciences: an International Journal
MUD: Mapping-based query processing for high-dimensional uncertain data

Information Sciences: an International Journal
Context-aware top-K processing using views

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes introduces new problems that are fundamentally different from conventional ranking. Specifically, uncertainty in records' scores induces a partial order over records, as opposed to the total order that is assumed in the conventional ranking settings. In this paper, we present a new probabilistic model, based on partial orders, to encapsulate the space of possible rankings originating from score uncertainty. Under this model, we formulate several ranking query types with different semantics. We describe and analyze a set of efficient query evaluation algorithms. We show that our techniques can be used to solve the problem of rank aggregation in partial orders under two widely adopted distance metrics. In addition, we design sampling techniques based on Markov chains to compute approximate query answers. Our experimental evaluation uses both real and synthetic data. The experimental study demonstrates the efficiency and effectiveness of our techniques under various configurations.