Profile-Based Object Matching for Information Integration

Authors:
AnHai Doan;Ying Lu;Yoonkyong Lee;Jiawei Han
Affiliations:
University of Illinois, Urbana-Champaign;University of Illinois, Urbana-Champaign;University of Illinois, Urbana-Champaign;University of Illinois, Urbana-Champaign
Venue:
IEEE Intelligent Systems
Year:
2003

Citing 16
Cited 18

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Autonomous citation matching

Proceedings of the third annual conference on Autonomous Agents
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Multistrategy Learning for Information Extraction

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Potter's Wheel: An Interactive Data Cleaning System

Proceedings of the 27th International Conference on Very Large Data Bases
Database Schema Matching Using Machine Learning with Feature Selection

CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On schema matching with opaque column names and data values

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Probabilistic reasoning for entity & relation recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Transferring and retraining learned information filters

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Automatically utilizing secondary sources to align information across sources

AI Magazine - Special issue on semantic integration
Semantic integration in text: from ambiguous names to identifiable entities

AI Magazine - Special issue on semantic integration
Semantic-integration research in the database community

AI Magazine - Special issue on semantic integration
From Wrapping to Knowledge

IEEE Transactions on Knowledge and Data Engineering
Matching knowledge elements in concept maps using a similarity flooding algorithm

Decision Support Systems
Integration of Ontology Data through Learning Instance Matching

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
A strategy for allowing meaningful and comparable scores in approximate matching

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization

Data & Knowledge Engineering
A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
A strategy for allowing meaningful and comparable scores in approximate matching

Information Systems
A strategy for allowing meaningful and comparable scores in approximate matching

Information Systems
Identification and tracing of ambiguous names: discriminative and generative approaches

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Constraint-based entity matching

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Generic entity resolution with negative rules

The VLDB Journal — The International Journal on Very Large Data Bases
Entity-aware query processing for heterogeneous data with uncertainty and correlations

Proceedings of the 2009 EDBT/ICDT Workshops
Duplicate detection through structure optimization

Proceedings of the 20th ACM international conference on Information and knowledge management
XML duplicate detection using sorted neighborhoods

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A semantic enrichment of data tables applied to food risk assessment

DS'05 Proceedings of the 8th international conference on Discovery Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Object matching is a fundamental problem that arises in numerous information integration scenarios. Virtually all existing solutions assume that the objects to be matched share the same attribute set and that systems can match them by comparing attribute similarities. Our work addresses the more general problem in which objects also have disjoint attributes-for example, matching tuples from relational tables that have different schemas, such as (age, name) and (name, salary). Profile-Based Object Matching, which applies this idea, exploits disjoint attributes to improve matching accuracy. PROM first matches any two tuples based on a shared attribute, such as name. It then applies a set of profilers, each of which contains some knowledge about what constitutes a typical person. The profilers examine the tuple pair to see if it plausibly describes a person. A profiler might state, for example, that if the pair produces a person with an age of 6 and a salary of $100,000, the pair doesn't describe a real person, so the tuples don't match. Profilers can be manually specified by domain experts, trained on training data, transferred from other matching tasks, or built from external data. PROM is thus distinct in that it not only exploits disjoint attributes to improve matching accuracy but also facilitates knowledge reuse from previous object-matching tasks.