Data integration: a theoretical perspective
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Constraint-based XML query rewriting for data integration
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Nested mappings: schema mapping reloaded
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Model management 2.0: manipulating richer mappings
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Structured materialized views for XML queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
iTrails: pay-as-you-go information integration in dataspaces
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XML data exchange: Consistency and query answering
Journal of the ACM (JACM)
Query efficiency in probabilistic XML models
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Bootstrapping pay-as-you-go data integration systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Clip: a tool for mapping hierarchical schemas
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Muse: a system for understanding and designing mappings
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Providing Top-K Alternative Schema Matchings with ${\mathcal{O}}nto {\mathcal{M}}atcher$
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Data integration with uncertainty
The VLDB Journal — The International Journal on Very Large Data Bases
Aggregate Query Answering under Uncertain Schema Mappings
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Twiglist: make twig pattern matching fast
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Foundations of uncertain-data integration
Proceedings of the VLDB Endowment
Managing uncertainty in schema matching with top-k schema mappings
Journal on Data Semantics VI
On the foundations of probabilistic information integration
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Despite advances in machine learning technologies a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of "possible mappings" between the schemas may be derived from the matching result. In this paper, we study problems related to managing possible mappings between two heterogeneous XML schemas. First, we study how to efficiently generate possible mappings for a given schema matching task. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. Second, storing and querying a large set of possible mappings can incur large storage and evaluation overhead. For XML schemas, we observe that their possible mappings often exhibit a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner and can be efficiently generated. Moreover, it facilitates the evaluation of a probabilistic twig query (PTQ), which returns the non-zero probability that a fragment of an XML document matches a given query. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ and present an efficient solution for it. An extensive evaluation on real-world data sets shows that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings.