Evaluating indeterministic duplicate detection results

  • Authors:
  • Fabian Panse;Norbert Ritter

  • Affiliations:
  • University of Hamburg, Hamburg, Germany;University of Hamburg, Hamburg, Germany

  • Venue:
  • SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Duplicate detection is an important process for cleaning or integrating data. Since real-life data is often polluted, detecting duplicates usually comes along with uncertainty. To handle duplicate uncertainty in an appropriate way, indeterministic duplicate detection approaches, i.e. approaches in which ambiguous duplicate decisions are probabilistically modeled in the resultant data, have been developed. To rate the goodness of a duplicate detection approach, its detection results need to be evaluated in their quality. In this paper, we propose several semantics to apply traditional quality evaluation measures to indeterministic duplicate detection results and exemplarily present an efficient evaluation for one of these semantics. Finally, we present some experimental results.