Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Complexity of answering queries using materialized views
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data integration: a theoretical perspective
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using Probabilistic Information in Data Integration
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Answering queries using views: A survey
The VLDB Journal — The International Journal on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Enterprise information integration: successes, challenges and controversies
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Foundations of probabilistic answers to queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Principles of dataspace systems
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Supporting ad-hoc ranking aggregates
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data integration: the teenage years
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Discover: keyword search in relational databases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Beauty and the beast: the theory and practice of information integration
ICDT'07 Proceedings of the 11th international conference on Database Theory
ICDT'07 Proceedings of the 11th international conference on Database Theory
World-set decompositions: expressiveness and efficient algorithms
ICDT'07 Proceedings of the 11th international conference on Database Theory
Managing uncertainty in schema matching with top-k schema mappings
Journal on Data Semantics VI
Management of data with uncertainties
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Data management projects at Google
ACM SIGMOD Record
Bootstrapping pay-as-you-go data integration systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
UQBE: uncertain query by example for web service mashup
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Query answering techniques on uncertain and probabilistic data: tutorial summary
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
XML data integration in SixP2P: a theoretical framework
DaMaP '08 Proceedings of the 2008 international workshop on Data management in peer-to-peer systems
Research on personal dataspace management
Proceedings of the 2nd SIGMOD PhD workshop on Innovative database research
Improving Data Integration through Disambiguation Techniques
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Reconciling Inconsistent Data in Probabilistic XML Data Integration
BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Organizing Structured Deep Web by Clustering Query Interfaces Link Graph
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Preference-Based Uncertain Data Integration
EKAW '08 Proceedings of the 16th international conference on Knowledge Engineering: Practice and Patterns
Optimization of Queries over Interval Probabilistic Data
SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
The VLDB Journal — The International Journal on Very Large Data Bases
A first tutorial on dataspaces
Proceedings of the VLDB Endowment
Systems aspects of probabilistic data management
Proceedings of the VLDB Endowment
Providing Top-K Alternative Schema Matchings with ${\mathcal{O}}nto {\mathcal{M}}atcher$
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Ten Challenges for Ontology Matching
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
Efficient top-k count queries over imprecise duplicates
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Data integration with uncertainty
The VLDB Journal — The International Journal on Very Large Data Bases
Indexing correlated probabilistic databases
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Data Modeling in Dataspace Support Platforms
Conceptual Modeling: Foundations and Applications
Information integration with uncertainty
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Qualitative effects of knowledge rules and user feedback in probabilistic data integration
The VLDB Journal — The International Journal on Very Large Data Bases
Creating probabilistic databases from duplicated data
The VLDB Journal — The International Journal on Very Large Data Bases
A unified approach to ranking in probabilistic databases
Proceedings of the VLDB Endowment
Ranking Approximate Query Rewritings Based on Views
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Entity-aware query processing for heterogeneous data with uncertainty and correlations
Proceedings of the 2009 EDBT/ICDT Workshops
Proceedings of the 13th International Conference on Database Theory
A Survey on Uncertainty Management in Data Integration
Journal of Data and Information Quality (JDIQ)
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
GRN model of probabilistic databases: construction, transition and querying
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Probabilistic string similarity joins
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Automatically incorporating new sources in keyword search-based data integration
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Lineage processing over correlated probabilistic databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Tuning the ensemble selection process of schema matchers
Information Systems
Top-k generation of mediated schemas over multiple data sources
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Automatic multi-schema integration based on user preference
WAIM'10 Proceedings of the 11th international conference on Web-age information management
On-the-fly entity-aware query processing in the presence of linkage
Proceedings of the VLDB Endowment
Foundations of uncertain-data integration
Proceedings of the VLDB Endowment
Tractability in probabilistic databases
Proceedings of the 14th International Conference on Database Theory
Restricting the overlap of Top-N sets in schema matching
Proceedings of the 1st Workshop on New Trends in Similarity Search
A unified approach to ranking in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Schema-as-you-go: on probabilistic tagging and querying of wide tables
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Journal of the ACM (JACM)
Discovering implicit categorical semantics for schema matching
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Discovery of probabilistic mappings between taxonomies: principles and experiments
Journal on data semantics XV
Rewriting fuzzy queries using imprecise views
ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
Multilingual schema matching for Wikipedia infoboxes
Proceedings of the VLDB Endowment
Search Computing
Local structure and determinism in probabilistic databases
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Aggregate queries on probabilistic record linkages
Proceedings of the 15th International Conference on Extending Database Technology
Appearance-Order-Based schema matching
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
On the foundations of probabilistic information integration
Proceedings of the 21st ACM international conference on Information and knowledge management
MFIBlocks: An effective blocking algorithm for entity resolution
Information Systems
Efficient and scalable monitoring and summarization of large probabilistic data
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
A compact representation for efficient uncertain-information integration
Proceedings of the 17th International Database Engineering & Applications Symposium
Hi-index | 0.00 |
This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels, and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, queries to the system may be posed with keywords rather than in a structured form. Third, the data from the sources may be extracted using information extraction techniques and so may yield imprecise data. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we don't know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of approximate schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting.