The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An Extensible Framework for Data Cleaning
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
The Bayesian structural EM algorithm
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Relational clustering for multi-type entity resolution
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
A Heterogeneous Field Matching Method for Record Linkage
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A probabilistic model for approximate identity matching
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
The Arizona IDMatcher: developing an identity matching tool for law enforcement
dg.o '07 Proceedings of the 8th annual international conference on Digital government research: bridging disciplines & domains
Towards automated record linkage
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Optimal Stopping: A Record-Linkage Approach
Journal of Data and Information Quality (JDIQ)
Constraint-based entity matching
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
An unsupervised approach for product record normalization across different web sites
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Journal of Artificial Intelligence Research
Unsupervised methods for determining object and relation synonyms on the web
Journal of Artificial Intelligence Research
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
Privacy-preserving record linkage
PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
Collective extraction from heterogeneous web lists
Proceedings of the fourth ACM international conference on Web search and data mining
Identity matching using personal and social identity features
Information Systems Frontiers
Which noun phrases denote which concepts?
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Matching unstructured product offers to structured product specifications
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic data generation for deduplication and data linkage
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
A multi-layer Naïve bayes model for approximate identity matching
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
A precise blocking method for record linkage
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Probability and equality: a probabilistic model of identity uncertainty
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Probabilistic iterative duplicate detection
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
Unsupervised duplicate detection using sample non-duplicates
Journal on Data Semantics VII
Aggregating web offers to determine product prices
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Entity resolution: theory, practice & open challenges
Proceedings of the VLDB Endowment
Proceedings of the sixth ACM international conference on Web search and data mining
Indeterministic Handling of Uncertain Decisions in Deduplication
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Hi-index | 0.00 |
The task of matching co-referent records is known among other names as record linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonably clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for the record-linkage problem in an unsupervised setting. In addition to proposing new methods, we also cast existing unsupervised probabilistic record-linkage methods in this framework. Some of the techniques we propose to minimize overfitting in the above model are of interest in the general graphical model setting. We describe a method for incorporating monotonicity constraints in a graphical model. We also outline a bootstrapping approach of using "single-field" classifiers to noisily label latent variables in a hierarchical model. Experimental results show that our proposed unsupervised methods perform quite competitively even with fully supervised record-linkage methods.