A comparative analysis of methodologies for database schema integration
ACM Computing Surveys (CSUR)
Methods and tools for equivalent data model mapping construction
EDBT '90 Proceedings of the 2nd international conference on extending database technology: Advances in Database Technology
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data integration: a theoretical perspective
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Relative information capacity of simple relational database schemata
PODS '84 Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems
Theoretical Aspects of Schema Merging
EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
The Use of Information Capacity in Schema Integration and Translation
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Rondo: a programming platform for generic model management
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Learning to map between structured representations of data
Learning to map between structured representations of data
ACM SIGMOD Record
Processing queries and merging schemas in support of data integration
Processing queries and merging schemas in support of data integration
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications
IEEE Transactions on Knowledge and Data Engineering
eTuner: tuning schema matching software using synthetic scenarios
The VLDB Journal — The International Journal on Very Large Data Bases
Ontology Alignment: Bridging the Semantic Gap (Semantic Web and Beyond)
Ontology Alignment: Bridging the Semantic Gap (Semantic Web and Beyond)
QMatch - Using paths to match XML schemas
Data & Knowledge Engineering
Matching large schemas: Approaches and evaluation
Information Systems
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Fast and effective clustering of XML data using structural information
Knowledge and Information Systems
An XML Schema integration and query mechanism system
Data & Knowledge Engineering
Interactive generation of integrated schemas
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Bootstrapping pay-as-you-go data integration systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
PORSCHE: Performance ORiented SCHEma mediation
Information Systems
Mediation-Based XML Query Answerability
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
XMiner: Mining XML Mediated Schemas
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Top-k generation of integrated schemas based on directed and weighted correspondences
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Schema Mediation for Heterogeneous XML Schema Sources
WAINA '09 Proceedings of the 2009 International Conference on Advanced Information Networking and Applications Workshops
Schema integration based on uncertain semantic mappings
ER'05 Proceedings of the 24th international conference on Conceptual Modeling
Evolution and change management of XML-based systems
Journal of Systems and Software
Analytical Processing Over XML and XLink
International Journal of Data Warehousing and Mining
Information Systems Frontiers
Hi-index | 0.00 |
Schema integration aims to create a mediated schema as a unified representation of existing heterogeneous sources sharing a common application domain. These sources have been increasingly written in XML due to its versatility and expressive power. Unfortunately, these sources often use different elements and structures to express the same concepts and relations, thus causing substantial semantic and structural conflicts. Such a challenge impedes the creation of high-quality mediated schemas and has not been adequately addressed by existing integration methods. In this paper, we propose a novel method, named XINTOR, for automating the integration of heterogeneous schemas. Given a set of XML sources and a set of correspondences between the source schemas, our method aims to create a complete and minimal mediated schema: it completely captures all of the concepts and relations in the sources without duplication, provided that the concepts do not overlap. Our contributions are fourfold. First, we resolve structural conflicts inherent in the source schemas. Second, we introduce a new statistics-based measure, called path cohesion, for selecting concepts and relations to be a part of the mediated schema. The path cohesion is statistically computed based on multiple path quality dimensions such as average path length and path frequency. Third, we resolve semantic conflicts by augmenting the semantics of similar concepts with context-dependent information. Finally, we propose a novel double-layered mediated schema to retain a wider range of concepts and relations than existing mediated schemas, which are at best either complete or minimal, but not both. Performed on both real and synthetic datasets, our experimental results show that XINTOR outperforms existing methods with respect to (i) the mediated-schema quality using precision, recall, F-measure, and schema minimality; and (ii) the execution performance based on execution time and scale-up performance.