Efficient management of uncertainty in XML schema matching

Authors:
Jian Gong;Reynold Cheng;David W. Cheung
Affiliations:
Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, People's Republic of China;Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, People's Republic of China;Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, People's Republic of China
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2012

Citing 20
Cited 1

Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Constraint-based XML query rewriting for data integration

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Nested mappings: schema mapping reloaded

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Model management 2.0: manipulating richer mappings

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Structured materialized views for XML queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
iTrails: pay-as-you-go information integration in dataspaces

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XML data exchange: Consistency and query answering

Journal of the ACM (JACM)
Query efficiency in probabilistic XML models

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Bootstrapping pay-as-you-go data integration systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Clip: a tool for mapping hierarchical schemas

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Muse: a system for understanding and designing mappings

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Providing Top-K Alternative Schema Matchings with ${\mathcal{O}}nto {\mathcal{M}}atcher$

ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Data integration with uncertainty

The VLDB Journal — The International Journal on Very Large Data Bases
Aggregate Query Answering under Uncertain Schema Mappings

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Twiglist: make twig pattern matching fast

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Foundations of uncertain-data integration

Proceedings of the VLDB Endowment
Managing uncertainty in schema matching with top-k schema mappings

Journal on Data Semantics VI

On the foundations of probabilistic information integration

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite advances in machine learning technologies a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of "possible mappings" between the schemas may be derived from the matching result. In this paper, we study problems related to managing possible mappings between two heterogeneous XML schemas. First, we study how to efficiently generate possible mappings for a given schema matching task. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. Second, storing and querying a large set of possible mappings can incur large storage and evaluation overhead. For XML schemas, we observe that their possible mappings often exhibit a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner and can be efficiently generated. Moreover, it facilitates the evaluation of a probabilistic twig query (PTQ), which returns the non-zero probability that a fragment of an XML document matches a given query. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ and present an efficient solution for it. An extensive evaluation on real-world data sets shows that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings.