Data integration with uncertainty

Authors:
Xin Dong;Alon Y. Halevy;Cong Yu
Affiliations:
University of Washington, Seattle, WA;Google Inc., Mountain View, CA;University of Michigan, Ann Arbor, MI
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 18
Cited 59

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Complexity of answering queries using materialized views

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using Probabilistic Information in Data Integration

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Enterprise information integration: successes, challenges and controversies

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Foundations of probabilistic answers to queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Principles of dataspace systems

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Supporting ad-hoc ranking aggregates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data integration: the teenage years

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Beauty and the beast: the theory and practice of information integration

ICDT'07 Proceedings of the 11th international conference on Database Theory
Approximate data exchange

ICDT'07 Proceedings of the 11th international conference on Database Theory
World-set decompositions: expressiveness and efficient algorithms

ICDT'07 Proceedings of the 11th international conference on Database Theory
Managing uncertainty in schema matching with top-k schema mappings

Journal on Data Semantics VI

Management of data with uncertainties

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Data management projects at Google

ACM SIGMOD Record
Bootstrapping pay-as-you-go data integration systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
UQBE: uncertain query by example for web service mashup

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Query answering techniques on uncertain and probabilistic data: tutorial summary

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
XML data integration in SixP2P: a theoretical framework

DaMaP '08 Proceedings of the 2008 international workshop on Data management in peer-to-peer systems
Research on personal dataspace management

Proceedings of the 2nd SIGMOD PhD workshop on Innovative database research
Improving Data Integration through Disambiguation Techniques

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Reconciling Inconsistent Data in Probabilistic XML Data Integration

BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Organizing Structured Deep Web by Clustering Query Interfaces Link Graph

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Preference-Based Uncertain Data Integration

EKAW '08 Proceedings of the 16th international conference on Knowledge Engineering: Practice and Patterns
Optimization of Queries over Interval Probabilistic Data

SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
System support for exploration and expert feedback in resolving conflicts during integration of metadata

The VLDB Journal — The International Journal on Very Large Data Bases
A first tutorial on dataspaces

Proceedings of the VLDB Endowment
Systems aspects of probabilistic data management

Proceedings of the VLDB Endowment
Providing Top-K Alternative Schema Matchings with ${\mathcal{O}}nto {\mathcal{M}}atcher$

ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Ten Challenges for Ontology Matching

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
Efficient top-k count queries over imprecise duplicates

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Data integration with uncertainty

The VLDB Journal — The International Journal on Very Large Data Bases
Indexing correlated probabilistic databases

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Dimensions of Dataspaces

BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Data Modeling in Dataspace Support Platforms

Conceptual Modeling: Foundations and Applications
Information integration with uncertainty

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Qualitative effects of knowledge rules and user feedback in probabilistic data integration

The VLDB Journal — The International Journal on Very Large Data Bases
Creating probabilistic databases from duplicated data

The VLDB Journal — The International Journal on Very Large Data Bases
A unified approach to ranking in probabilistic databases

Proceedings of the VLDB Endowment
Ranking Approximate Query Rewritings Based on Views

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Entity-aware query processing for heterogeneous data with uncertainty and correlations

Proceedings of the 2009 EDBT/ICDT Workshops
Probabilistic data exchange

Proceedings of the 13th International Conference on Database Theory
A Survey on Uncertainty Management in Data Integration

Journal of Data and Information Quality (JDIQ)
From information to knowledge: harvesting entities and relationships from web sources

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
GRN model of probabilistic databases: construction, transition and querying

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Probabilistic string similarity joins

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Automatically incorporating new sources in keyword search-based data integration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Lineage processing over correlated probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Tuning the ensemble selection process of schema matchers

Information Systems
Top-k generation of mediated schemas over multiple data sources

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Automatic multi-schema integration based on user preference

WAIM'10 Proceedings of the 11th international conference on Web-age information management
On-the-fly entity-aware query processing in the presence of linkage

Proceedings of the VLDB Endowment
Foundations of uncertain-data integration

Proceedings of the VLDB Endowment
Tractability in probabilistic databases

Proceedings of the 14th International Conference on Database Theory
Restricting the overlap of Top-N sets in schema matching

Proceedings of the 1st Workshop on New Trends in Similarity Search
A unified approach to ranking in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Schema-as-you-go: on probabilistic tagging and querying of wide tables

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Probabilistic data exchange

Journal of the ACM (JACM)
Discovering implicit categorical semantics for schema matching

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Discovery of probabilistic mappings between taxonomies: principles and experiments

Journal on data semantics XV
Rewriting fuzzy queries using imprecise views

ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
Multilingual schema matching for Wikipedia infoboxes

Proceedings of the VLDB Endowment
Chapter 7: dataspaces

Search Computing
Quality-aware service-oriented data integration: requirements, state of the art and open challenges

ACM SIGMOD Record
Local structure and determinism in probabilistic databases

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Aggregate queries on probabilistic record linkages

Proceedings of the 15th International Conference on Extending Database Technology
Appearance-Order-Based schema matching

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
On the foundations of probabilistic information integration

Proceedings of the 21st ACM international conference on Information and knowledge management
MFIBlocks: An effective blocking algorithm for entity resolution

Information Systems
Efficient and scalable monitoring and summarization of large probabilistic data

Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
A compact representation for efficient uncertain-information integration

Proceedings of the 17th International Database Engineering & Applications Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels, and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, queries to the system may be posed with keywords rather than in a structured form. Third, the data from the sources may be extracted using information extraction techniques and so may yield imprecise data. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we don't know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of approximate schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting.