Query processing over incomplete autonomous databases

Authors:
Garrett Wolf;Hemal Khatri;Bhaumik Chokshi;Jianchun Fan;Yi Chen;Subbarao Kambhampati
Affiliations:
Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 21
Cited 7

Incomplete Information in Relational Databases

Journal of the ACM (JACM)
Statistical analysis with missing data

Statistical analysis with missing data
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
On semantic issues connected with incomplete information databases

ACM Transactions on Database Systems (TODS)
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Robust Learning with Missing Data

Machine Learning
Machine Learning

Machine Learning
Approximate Dependency Inference from Relations

ICDT '92 Proceedings of the 4th International Conference on Database Theory
Efficient Discovery of Functional and Approximate Dependencies Using Partitions

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adapting to source properties in processing data integration queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Using Association Rules for Completing Missing Data

HIS '04 Proceedings of the Fourth International Conference on Hybrid Intelligent Systems
Foundations of probabilistic answers to queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
OLAP over uncertain and imprecise data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Clean Answers over Dirty Databases: A Probabilistic Approach

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Answering Imprecise Queries over Autonomous Web Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Consistent query answering in databases

ACM SIGMOD Record
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Online query relaxation via Bayesian causal structures discovery

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2

HLS: Tunable Mining of Approximate Functional Dependencies

BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Depth first algorithms and inferencing for AFD mining

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Supporting ranking queries on uncertain and incomplete data

The VLDB Journal — The International Journal on Very Large Data Bases
Source selection in large scale data contexts: an optimization approach

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Improving source selection in large scale mediation systems through combinatorial optimization techniques

Transactions on large-scale data- and knowledge-centered systems III
Satisfaction-based query replication

Distributed and Parallel Databases
SMARTINT: using mined attribute dependencies to integrate fragmented web databases

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Incompleteness due to missing attribute values (aka "null values") is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such possible answers and gauge their relevance by accessing their likelihood of being pertinent answers to the query. The autonomous nature of web databases poses several challenges in realizing this objective. Such challenges include the restricted access privileges imposed on the data, the limited support for query patterns, and the bounded pool of database and network resources in the web environment. We introduce a novel query rewriting and optimization framework QPIAD that tackles these challenges. Our technique involves reformulating the user query based on mined correlations among the database attributes. The reformulated queries are aimed at retrieving the relevant possible answers in addition to the certain answers. QPIAD is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of Approximate Functional Dependencies), value distributions (in the form of Naïve Bayes Classifiers), and selectivity estimates. We present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possible answers with high precision, high recall, and manageable cost.