Semantic integration in Xyleme: a uniform tree-based approach

  • Authors:
  • Claude Delobel;Chantal Reynaud;Marie-Christine Rousset;Jean-Pierre Sirot;Dan Vodislav

  • Affiliations:
  • University of Paris Sud--CNRS (L.R.I.) and INRIA (Futurs), L.R.I., Building 490, 91405, Orsay Cedex, France;University of Paris Sud--CNRS (L.R.I.) and INRIA (Futurs), L.R.I., Building 490, 91405, Orsay Cedex, France;University of Paris Sud--CNRS (L.R.I.) and INRIA (Futurs), L.R.I., Building 490, 91405, Orsay Cedex, France;Xyleme S.A., 6 rue Emile Verhaeren, Saint-Cloud, France;CNAM/CEDRIC, 292 rue Saint-Martin, Paris, France

  • Venue:
  • Data & Knowledge Engineering - Special issue: Data integration over the Web
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Xyleme is a huge warehouse integrating XML data of the Web. Xyleme considers a simple data model with data trees and tree types for describing the data sources, and a simple query language based on tree queries with boolean conditions. The main components of the data model are a mediated schema modeled by an abstract tree type, as a view of a set of tree types associated with actual data trees, called concrete tree types, and a mapping expressing the connection between the mediated schema and the concrete tree types. The first contribution of this paper is formal: we provide a declarative model-theoretic semantics for Xyleme tree queries, a way of checking tree query containment, and a characterization of tree queries as a composition of branch queries. The other contributions are algorithmic and handle the potentially huge size of the mapping relation which is a crucial issue for semantic integration and query evaluation in Xyleme. First, we propose a method for pre-evaluating queries at compile time by storing some specific meta-information about the mapping into map translation tables. These map translation tables summarize the set of all the branch queries that can be generated from the mediated schema and the set of all the mappings. Then, we propose different operators and strategies for relaxing queries which, having an empty map translation table, will have no answer if they are evaluated against the data. Finally, we present a method for semi-automatically generating the mapping relation.