Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework

  • Authors:
  • Hamid Haidarian Shahri;Saied Haidarian Shahri

  • Affiliations:
  • University of Maryland;University of Tehran

  • Venue:
  • IEEE Intelligent Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Approximate duplicate elimination is an important data-integration task, but its complex comparisons of many records involvinguncertainty and ambiguity make it difficult. Earlier approaches required a time-consuming and tedious process of hard coding of staticrules based on a schema. A novel duplicate-elimination framework now lets users clean data flexibly and effortlessly, without any coding.Exploiting fuzzy inference inherently handles the problem's uncertainty, and unique machine learning capabilities let the framework adaptto the specific notion of similarity appropriate for each domain. The framework is extensible and accommodative, letting the user operatewith or without training data. Additionally, many of the previous methods for duplicate elimination can be implemented quickly using thisframework.