iTrails: pay-as-you-go information integration in dataspaces

  • Authors:
  • Marcos Antonio Vaz Salles;Jens-Peter Dittrich;Shant Kirakos Karakashian;Olivier René Girard;Lukas Blunschi

  • Affiliations:
  • ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland

  • Venue:
  • VLDB '07 Proceedings of the 33rd international conference on Very large data bases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dataspace management has been recently identified as a new agenda for information management [17, 22] and information integration [23]. In sharp contrast to standard information integration architectures, a dataspace management system is a data-coexistence approach: it does not require any investments in semantic integration before querying services on the data are provided. Rather, a dataspace can be gradually enhanced over time by defining relationships among the data. Defining those integration semantics gradually is termed pay-as-you-go information integration [17], as time and effort (pay) are needed over time (go) to provide integration semantics. The benefits are better query results (gain). This paper is the first to explore pay-as-you-go information integration in dataspaces. We provide a technique for declarative pay-as-you-go information integration named iTrails. The core idea of our approach is to declaratively add lightweight 'hints' (trails) to a search engine thus allowing gradual enrichment of loosely integrated data sources. Our experiments confirm that iTrails can be efficiently implemented introducing only little overhead during query execution. At the same time iTrails strongly improves the quality of query results. Furthermore, we present rewriting and pruning techniques that allow us to scale iTrails to tens of thousands of trail definitions with minimal growth in the rewritten query size.