Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Complexity of answering queries using materialized views
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data integration: a theoretical perspective
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using Probabilistic Information in Data Integration
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Answering queries using views: A survey
The VLDB Journal — The International Journal on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Data exchange: getting to the core
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Enterprise information integration: successes, challenges and controversies
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Foundations of probabilistic answers to queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Composing schema mappings: Second-order dependencies to the rescue
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Principles of dataspace systems
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Supporting ad-hoc ranking aggregates
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data integration: the teenage years
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Implementing mapping composition
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Information retrieval and machine learning for probabilistic schema matching
Information Processing and Management: an International Journal
Why is schema matching tough and what can we do about it?
ACM SIGMOD Record
Discover: keyword search in relational databases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Composing mappings among data sources
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Data integration with uncertainty
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Bootstrapping pay-as-you-go data integration systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ICDT'07 Proceedings of the 11th international conference on Database Theory
World-set decompositions: expressiveness and efficient algorithms
ICDT'07 Proceedings of the 11th international conference on Database Theory
Feedback-based annotation, selection and refinement of schema mappings for dataspaces
Proceedings of the 13th International Conference on Extending Database Technology
Proceedings of the 2010 EDBT/ICDT Workshops
Proceedings of the 13th International Conference on Database Theory
Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Machine reading at the University of Washington
FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Automatic schema merging using mapping constraints among incomplete sources
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Combining logic and probabilities for discovering mappings between taxonomies
KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Set similarity join on probabilistic data
Proceedings of the VLDB Endowment
Value joins are expensive over (probabilistic) XML
Proceedings of the 4th International Workshop on Logic in Databases
Efficient query answering in probabilistic RDF graphs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Journal of the ACM (JACM)
The monte carlo database system: Stochastic analysis close to the data
ACM Transactions on Database Systems (TODS)
Efficient processing of probabilistic set-containment queries on uncertain set-valued data
Information Sciences: an International Journal
Search Computing
Efficient subject-oriented evaluating and mining methods for data with schema uncertainty
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Efficient management of uncertainty in XML schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Answering queries using views over probabilistic XML: complexity and tractability
Proceedings of the VLDB Endowment
On the foundations of probabilistic information integration
Proceedings of the 21st ACM international conference on Information and knowledge management
Non-binary evaluation for schema matching
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Indeterministic Handling of Uncertain Decisions in Deduplication
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Incrementally improving dataspaces based on user feedback
Information Systems
A compact representation for efficient uncertain-information integration
Proceedings of the 17th International Database Engineering & Applications Symposium
Reducing uncertainty of schema matching via crowdsourcing
Proceedings of the VLDB Endowment
Schema matching prediction with applications to data source discovery and dynamic ensembling
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, the data from the sources may be extracted using information extraction techniques and so may yield erroneous data. Third, queries to the system may be posed with keywords rather than in a structured form. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we do not know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of probabilistic schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting. Finally, we consider using probabilistic mappings in the scenario of data exchange.