XML data integration with identification

  • Authors:
  • Antonella Poggi;Serge Abiteboul

  • Affiliations:
  • INRIA Futurs – Parc Club Orsay-University, Orsay, France;INRIA Futurs – Parc Club Orsay-University, Orsay, France

  • Venue:
  • DBPL'05 Proceedings of the 10th international conference on Database Programming Languages
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data integration is the problem of combining data residing at different sources, and providing the user with a virtual view, called global schema, which is independent from the model and the physical origin of the sources. Whereas many data integration systems and theoretical works have been proposed for relational data, not much investigation has been focused yet on XML data integration. Our goal is therefore to address some of its related issues. In particular, we highlight two major issues that emerge in the XML context: (i) the global schema may be characterized by a set of constraints, expressed by means of a DTD and XML integrity constraints, (ii) the concept of node identity requires to introduce semantic criteria to identify nodes coming from different sources. We propose a formal framework for XML data integration systems based on an expressive XML global schema, a set of XML data sources and a set of mappings specified by means of a simple tree language. Then, we define an identification function that aims at globally identifying nodes coming from different sources. Finally, we propose algorithms to answer queries under different assumptions for the mappings.