An XML Schema integration and query mechanism system

  • Authors:
  • Sanjay Madria;Kalpdrum Passi;Sourav Bhowmick

  • Affiliations:
  • Department of Computer Science, University of Missouri-Rolla, Rolla MO 65401, USA;Department of Mathematics and Computer Science, Laurentian University, Sudbury, ON, Canada P3E2C6;School of Computer Engineering, Nanyang Technological University, Singapore

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The availability of large amounts of heterogeneous distributed web data necessitates the integration of XML data from multiple XML sources for many reasons. For example, currently, there are many e-commerce companies, which offer similar products but use different XML Schemas with possibly different ontologies. When any two such companies merge, or make an effort to service customers in cooperation, there is a need for an integrated schema and query mechanism for the interoperability of applications. In applications like comparison-shopping, there is a need for an illusionary centralized homogeneous information system. In this paper, we propose XML Schema integration and querying methodology. We define an object-oriented data model called XSDM (XML Schema Data Model) and present a graphical representation of XML Schema for the purpose of schema integration. We use a three-layered architecture for XML Schema integration. The three layers included are namely pre-integration, comparison, and integration. The three layers can conceptually be regarded as three phases of the integration process. During pre-integration, the schemas present in XML Schema notation are read and converted into the XSDM notation. During the comparison phase of integration, correspondences as well as conflicts between elements are identified. During the integration phase, conflict resolution, restructuring and merging of the initial schemas takes place to obtain the global schema. We define integration policies for integrating element definitions as well as their datatypes and attributes. An integrated global schema forms the basis for querying a set of local XML documents. We discuss various strategies for rewriting the global query over the global schema into the sub-queries over local schemas. Their respective local schemas validate the sub-queries over the local XML documents. This requires the identification and use of mapping rules and relationships between the local schemas.