Querying uncertain data with aggregate constraints

Authors:
Mohan Yang;Haixun Wang;Haiquan Chen;Wei-Shinn Ku
Affiliations:
Shanghai Jiao Tong University, Shanghai, China;Microsoft Research Asia, Beijing, China;Auburn University, Auburn, AL, USA;Auburn University, Auburn, AL, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 23
Cited 1

Procedures for optimization problems with a mixture of bounds and general linear constraints

ACM Transactions on Mathematical Software (TOMS)
On selecting a satisfying truth assignment (extended abstract)

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
U-DBMS: a database system for managing constantly-evolving data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Clean Answers over Dirty Databases: A Probabilistic Approach

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Leveraging aggregate constraints for deduplication

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Query language support for incomplete information in the MayBMS system

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Conditioning probabilistic databases

Proceedings of the VLDB Endowment
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A Sampling-Based Approach to Information Recovery

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Uncertainty management in rule-based information extraction systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Risky business: modeling and exploiting uncertainty in information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A Markov Chain Monte Carlo Sampler for Mixed Boolean/Integer Constraints

CAV '09 Proceedings of the 21st International Conference on Computer Aided Verification
Constraint-based entity matching

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Counting CSP solutions using generalized XOR constraints

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Domain-independent extensions to GSAT: solving large structured satisfiability problems

IJCAI'93 Proceedings of the 13th international joint conference on Artifical intelligence - Volume 1
The trichotomy of HAVING queries on a probabilistic database

The VLDB Journal — The International Journal on Very Large Data Bases
Leveraging spatio-temporal redundancy for RFID data cleansing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A new method for solving hard satisfiability problems

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
k-nearest neighbors in uncertain graphs

Proceedings of the VLDB Endowment
Distance-constraint reachability computation in uncertain graphs

Proceedings of the VLDB Endowment

Aggregation in probabilistic databases via knowledge compilation

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data uncertainty arises in many situations. A common approach to query processing uncertain data is to sample many "possible worlds" from the uncertain data and to run queries against the possible worlds. However, sampling is not a trivial task, as a randomly sampled possible world may not satisfy known constraints imposed on the data. In this paper, we focus on an important category of constraints, the aggregate constraints. An aggregate constraint is placed on a set of records instead of on a single record, and a real-life system usually has a large number of aggregate constraints. It is a challenging task to find qualified possible worlds in this scenario, since tuple by tuple sampling is extremely inefficient because it rarely leads to a qualified possible world. In this paper, we introduce two approaches for querying uncertain data with aggregate constraints: constraint aware sampling and MCMC sampling. Our experiments show that the new approaches lead to high quality query results with reasonable cost.