Object identification with attribute-mediated dependences

Authors:
Parag Singla;Pedro Domingos
Affiliations:
Department of Computer Science and Engineering, University of Washington, Seattle, WA;Department of Computer Science and Engineering, University of Washington, Seattle, WA
Venue:
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2005

Citing 12
Cited 15

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Hardening soft information sources

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Iterative record linkage for cleaning and integration

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Discriminative probabilistic models for relational data

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Structure-based inference of xml similarity for fuzzy duplicate detection

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Structured entity identification and document categorization: two tasks with one joint model

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling up duplicate detection in graph data

Proceedings of the 17th ACM conference on Information and knowledge management
Refining Instance Coreferencing Results Using Belief Propagation

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Swoosh: a generic approach to entity resolution

The VLDB Journal — The International Journal on Very Large Data Bases
L2R: a logical method for reference reconciliation

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Reasoning about record matching rules

Proceedings of the VLDB Endowment
Declarative XML data cleaning with XClean

CAiSE'07 Proceedings of the 19th international conference on Advanced information systems engineering
EIF: a framework of effective entity identification

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Evaluation of entity resolution approaches on real-world match problems

Proceedings of the VLDB Endowment
Dynamic constraints for record matching

The VLDB Journal — The International Journal on Very Large Data Bases
Duplicate detection through structure optimization

Proceedings of the 20th ACM international conference on Information and knowledge management
On the decidability and complexity of identity knowledge representation

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
10th international workshop on quality in databases: QDB 2012

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

Object identification is the problem of determining whether different observations correspond to the same object. It occurs in a wide variety of fields, including vision, natural language, citation matching, and information integration. Traditionally, the problem is solved separately for each pair of observations, followed by transitive closure. We propose solving it collectively, performing simultaneous inference for all candidate match pairs, and allowing information to propagate from one candidate match to another via the attributes they have in common. Our formulation is based on conditional random fields, and allows an optimal solution to be found in polynomial time using a graph cut algorithm. Parameters are learned using a voted perceptron algorithm. Experiments on real and synthetic datasets show that this approach outperforms the standard one.