The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Hardening soft information sources
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Discriminative probabilistic models for relational data
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Structure-based inference of xml similarity for fuzzy duplicate detection
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Structured entity identification and document categorization: two tasks with one joint model
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling up duplicate detection in graph data
Proceedings of the 17th ACM conference on Information and knowledge management
Refining Instance Coreferencing Results Using Belief Propagation
ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
L2R: a logical method for reference reconciliation
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Reasoning about record matching rules
Proceedings of the VLDB Endowment
Declarative XML data cleaning with XClean
CAiSE'07 Proceedings of the 19th international conference on Advanced information systems engineering
EIF: a framework of effective entity identification
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Evaluation of entity resolution approaches on real-world match problems
Proceedings of the VLDB Endowment
Dynamic constraints for record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Duplicate detection through structure optimization
Proceedings of the 20th ACM international conference on Information and knowledge management
On the decidability and complexity of identity knowledge representation
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
10th international workshop on quality in databases: QDB 2012
ACM SIGMOD Record
Hi-index | 0.00 |
Object identification is the problem of determining whether different observations correspond to the same object. It occurs in a wide variety of fields, including vision, natural language, citation matching, and information integration. Traditionally, the problem is solved separately for each pair of observations, followed by transitive closure. We propose solving it collectively, performing simultaneous inference for all candidate match pairs, and allowing information to propagate from one candidate match to another via the attributes they have in common. Our formulation is based on conditional random fields, and allows an optimal solution to be found in polynomial time using a graph cut algorithm. Parameters are learned using a voted perceptron algorithm. Experiments on real and synthetic datasets show that this approach outperforms the standard one.