ALIAS: an active learning led interactive deduplication system

  • Authors:
  • Sunita Sarawagi;Anuradha Bhamidipaty;Alok Kirpal;Chandra Mouli

  • Affiliations:
  • Indian Institute of Technology, Bombay;Indian Institute of Technology, Bombay;Indian Institute of Technology, Bombay;Indian Institute of Technology, Bombay

  • Venue:
  • VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Deduplication, a key operation in integrating data from multiple sources, is a time-consuming, labor-intensive and domain-specific operation. We present our design of ALIAS that uses a novel approach to ease this task by limiting the manual effort to inputing simple, domain-specific attribute similarity functions and interactively labeling a small number of record pairs. We describe how active learning is useful in selecting informative examples of duplicates and nonduplicates that can be used to train a deduplication function. ALIAS provides mechanism for efficiently applying the function on large lists of records using a novel cluster-based execution model.