Using similarity-based operations for resolving data-level conflicts

  • Authors:
  • Eike Schallehn;Kai-Uwe Sattler

  • Affiliations:
  • Department of Computer Science, University of Magdeburg, Magdeburg, Germany;Department of Computer Science, University of Magdeburg, Magdeburg, Germany

  • Venue:
  • BNCOD'03 Proceedings of the 20th British national conference on Databases
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join seems to be a viable way, they fail if the attribute values of the potential duplicates or related tuples are not equal but only similar by certain criteria. As a solution to this problem, we present in this paper similarity-based variants of grouping and join operators. The extended grouping operator produces groups of similar tuples, the extended join combines tuples satisfying a given similarity condition. We describe the semantics of these operators, discuss efficient implementations for the edit distance similarity and present evaluation results. Finally, we give examples how the operators can be used in given application scenarios.