Incomplete Information in Relational Databases
Journal of the ACM (JACM)
Statistical analysis with missing data
Statistical analysis with missing data
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
On semantic issues connected with incomplete information databases
ACM Transactions on Database Systems (TODS)
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Robust Learning with Missing Data
Machine Learning
Machine Learning
Approximate Dependency Inference from Relations
ICDT '92 Proceedings of the 4th International Conference on Database Theory
Efficient Discovery of Functional and Approximate Dependencies Using Partitions
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Evaluating probabilistic queries over imprecise data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adapting to source properties in processing data integration queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Using Association Rules for Completing Missing Data
HIS '04 Proceedings of the Fourth International Conference on Hybrid Intelligent Systems
Foundations of probabilistic answers to queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
OLAP over uncertain and imprecise data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Working Models for Uncertain Data
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Answering Imprecise Queries over Autonomous Web Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Consistent query answering in databases
ACM SIGMOD Record
Creating probabilistic databases from information extraction models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Supporting top-K join queries in relational databases
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Online query relaxation via Bayesian causal structures discovery
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
HLS: Tunable Mining of Approximate Functional Dependencies
BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Depth first algorithms and inferencing for AFD mining
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Supporting ranking queries on uncertain and incomplete data
The VLDB Journal — The International Journal on Very Large Data Bases
Source selection in large scale data contexts: an optimization approach
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Transactions on large-scale data- and knowledge-centered systems III
Satisfaction-based query replication
Distributed and Parallel Databases
SMARTINT: using mined attribute dependencies to integrate fragmented web databases
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Incompleteness due to missing attribute values (aka "null values") is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such possible answers and gauge their relevance by accessing their likelihood of being pertinent answers to the query. The autonomous nature of web databases poses several challenges in realizing this objective. Such challenges include the restricted access privileges imposed on the data, the limited support for query patterns, and the bounded pool of database and network resources in the web environment. We introduce a novel query rewriting and optimization framework QPIAD that tackles these challenges. Our technique involves reformulating the user query based on mined correlations among the database attributes. The reformulated queries are aimed at retrieving the relevant possible answers in addition to the certain answers. QPIAD is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of Approximate Functional Dependencies), value distributions (in the form of Naïve Bayes Classifiers), and selectivity estimates. We present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possible answers with high precision, high recall, and manageable cost.