Incomplete Information in Relational Databases
Journal of the ACM (JACM)
Statistical analysis with missing data
Statistical analysis with missing data
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
On semantic issues connected with incomplete information databases
ACM Transactions on Database Systems (TODS)
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Robust Learning with Missing Data
Machine Learning
Machine Learning
Approximate Dependency Inference from Relations
ICDT '92 Proceedings of the 4th International Conference on Database Theory
Efficient Discovery of Functional and Approximate Dependencies Using Partitions
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Evaluating probabilistic queries over imprecise data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adapting to source properties in processing data integration queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Using Association Rules for Completing Missing Data
HIS '04 Proceedings of the Fourth International Conference on Hybrid Intelligent Systems
Foundations of probabilistic answers to queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Answering queries from statistics and probabilistic views
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Working Models for Uncertain Data
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Answering Imprecise Queries over Autonomous Web Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Data exchange and incomplete information
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Consistent query answering in databases
ACM SIGMOD Record
Creating probabilistic databases from information extraction models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
OLAP over uncertain and imprecise data
The VLDB Journal — The International Journal on Very Large Data Bases
Leveraging aggregate constraints for deduplication
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Mining functional dependencies from data
Data Mining and Knowledge Discovery
Dependencies revisited for improving data quality
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On generating near-optimal tableaux for conditional functional dependencies
Proceedings of the VLDB Endowment
Online query relaxation via Bayesian causal structures discovery
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Design by example for SQL table definitions with functional dependencies
The VLDB Journal — The International Journal on Very Large Data Bases
Assessing relevance and trust of the deep web sources and results based on inter-source agreement
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
Incompleteness due to missing attribute values (aka "null values") is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such possibleanswers and gauge their relevance by accessing their likelihood of being pertinent answers to the query. The autonomous nature of web databases poses several challenges in realizing this objective. Such challenges include the restricted access privileges imposed on the data, the limited support for query patterns, and the bounded pool of database and network resources in the web environment. We introduce a novel query rewriting and optimization framework QPIAD that tackles these challenges. Our technique involves reformulating the user query based on mined correlations among the database attributes. The reformulated queries are aimed at retrieving the relevant possibleanswers in addition to the certain answers. QPIAD is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of Approximate Functional Dependencies), value distributions (in the form of Naïve Bayes Classifiers), and selectivity estimates. We present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possibleanswers with high precision, high recall, and manageable cost.