Compiling Source Descriptions for Efficient and Flexible Information Integration

  • Authors:
  • José Luis Ambite;Craig A. Knoblock;Ion Muslea;Andrew G. Philpot

  • Affiliations:
  • Information Sciences Institute, Integrated Media Systems Center, and Department of Computer Science, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA 90292. ambite ...;Information Sciences Institute, Integrated Media Systems Center, and Department of Computer Science, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA 90292. knoblo ...;Information Sciences Institute, Integrated Media Systems Center, and Department of Computer Science, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA 90292. muslea ...;Information Sciences Institute, Integrated Media Systems Center, and Department of Computer Science, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA 90292. philpo ...

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Integrating data from heterogeneous data sources is a critical problem that has received a great deal of attention in recent years. There are two competing approaches to address this problem. The traditional approach, which first appeared in Multibase and more recently in HERMES and TSIMMIS, often called global-as-view, defines the global model as a view on the sources. A more recent approach, sometimes referred to as local-as-view or view rewriting, defines the sources as views on the global model. The disadvantage of the first approach is that a person must re-engineer the definitions of the global model whenever any of the sources change or when new sources are added. The view rewriting approach does not suffer from this drawback, but the problem of rewriting queries into equivalent plans using views is computationally hard and must be performed for each query at run-time.In this paper we propose a hybrid approach that amortizes the cost of query processing over all queries by pre-compiling the source descriptions into a minimal set of integration axioms. Using this approach, the sources are defined in terms of the global model and then compiled into axioms that define the global model in terms of the sources. These axioms can be efficiently instantiated at run-time to determine the most appropriate rewriting to answer a query and facilitate traditional cost-based query optimization. Our approach combines the flexibility of the local-as-view approach with the run-time efficiency of the query processing in global-as-view systems. We have implemented this approach for the SIMS and Ariadne information mediators and provide empirical results that demonstrate that in practice the approach scales to large numbers of sources and that the approach can compile the axioms for a variety of real-world domains in a matter of seconds.