Active duplicate detection

  • Authors:
  • Ke Deng;Liwei Wang;Xiaofang Zhou;Shazia Sadiq;Gabriel Pui Cheong Fung

  • Affiliations:
  • The University of Queensland, Australia;Wuhan University, China;The University of Queensland, Australia;The University of Queensland, Australia;The University of Queensland, Australia

  • Venue:
  • DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The aim of duplicate detection is to group records in a relation which refer to the same entity in the real world such as a person or business. Most existing works require user specified parameters such as similarity threshold in order to conduct duplicate detection. These methods are called user-first in this paper. However, in many scenarios, pre-specification from the user is very hard and often unreliable, thus limiting applicability of user-first methods. In this paper, we propose a user-last method, called Active Duplicate Detection (ADD), where an initial solution is returned without forcing user to specify such parameters and then user is involved to refine the initial solution. Different from user-first methods where user makes decision before any processing, ADD allows user to make decision based on an initial solution. The identified initial solution in ADD enjoys comparatively high quality and is easy to be refined in a systematic way (at almost zero cost).