Profile-Based Object Matching for Information Integration

  • Authors:
  • AnHai Doan;Ying Lu;Yoonkyong Lee;Jiawei Han

  • Affiliations:
  • University of Illinois, Urbana-Champaign;University of Illinois, Urbana-Champaign;University of Illinois, Urbana-Champaign;University of Illinois, Urbana-Champaign

  • Venue:
  • IEEE Intelligent Systems
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Object matching is a fundamental problem that arises in numerous information integration scenarios. Virtually all existing solutions assume that the objects to be matched share the same attribute set and that systems can match them by comparing attribute similarities. Our work addresses the more general problem in which objects also have disjoint attributes-for example, matching tuples from relational tables that have different schemas, such as (age, name) and (name, salary). Profile-Based Object Matching, which applies this idea, exploits disjoint attributes to improve matching accuracy. PROM first matches any two tuples based on a shared attribute, such as name. It then applies a set of profilers, each of which contains some knowledge about what constitutes a typical person. The profilers examine the tuple pair to see if it plausibly describes a person. A profiler might state, for example, that if the pair produces a person with an age of 6 and a salary of $100,000, the pair doesn't describe a real person, so the tuples don't match. Profilers can be manually specified by domain experts, trained on training data, transferred from other matching tasks, or built from external data. PROM is thus distinct in that it not only exploits disjoint attributes to improve matching accuracy but also facilitates knowledge reuse from previous object-matching tasks.