Completeness of integrated information sources

  • Authors:
  • Felix Naumann;Johann-Christoph Freytag;Ulf Leser

  • Affiliations:
  • Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany;Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany;Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany

  • Venue:
  • Information Systems - Special issue: Data quality in cooperative information systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

For many information domains there are numerous World Wide Web data sources. The sources vary both in their extension and their intension: They represent different real-world entities with possible overlap and provide different attriouites of these entities. Mediator-based information systems allow integrated access to such sources by providing a common schema against which the user can pose queries. Given a query, the mediator must determine which participating sources to access and how to integrate the incoming results.This article describes how to support mediators in their source selection and query planning process. We propose three new merge operators, which formalize the integration of multiple source responses. A completeness model describes the usefulness of a source to answer a query. The completeness measure incorporates both extensional value (called coverage) and intensional value (called density) of a source. We show how to determine the completeness of single sources and of combinations of sources under the new merge operators. Finally, we show how to use the measure for source selection and query planning.