Scalability of Source Identification in Data Integration Systems

Authors:
François Boisson;Michel Scholl;Imen Sebei;Dan Vodislav
Affiliations:
CNAM/CEDRIC, Paris, France;CNAM/CEDRIC, Paris, France;CNAM/CEDRIC, Paris, France;CNAM/CEDRIC, Paris, France
Venue:
Advanced Internet Based Systems and Applications
Year:
2009

Citing 9
Cited 0

Answering queries using views (extended abstract)

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
XML-based information mediation with MIX

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Garlic: a new flavor of federated query processing for DB2

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying XML Sources Using an Ontology-Based Mediator

On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
MiniCon: A scalable algorithm for answering queries using views

The VLDB Journal — The International Journal on Very Large Data Bases
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Constraint-based XML query rewriting for data integration

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Determining source contribution in integration systems

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a large number of data sources, each of them being indexed by attributes from a predefined set $\cal{A}$ and given a query q over a subset Q of $\cal{A}$ with size k attributes, we are interested in identifying the set of all possible combinations of sources such that the union of their attributes covers Q . Each combination c may lead to a rewriting of q as a join over the sources in c . Furthermore, to limit redundancy and combinatorial explosion, we want the combination of sources to produce a minimal cover of Q . Although motivated by query rewriting in OpenXView [3], an XML data integration system with a large number of XML sources, we believe that the solutions provided in this paper apply to other scalable data integration schemes. In this paper we focus on the cases where the number of sources is very large, while the size of queries is small. We propose a novel algorithm for the computation of the set of minimal covers of a query and experimentally evaluate its performance.