The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Making large-scale support vector machine learning practical
Advances in kernel methods
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A hierarchical graphical model for record linkage
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Object identification in a Bayesian context
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Generalized Mongue-Elkan Method for Approximate Text String Comparison
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Learning blocking schemes for record linkage
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
On-the-Fly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Learning string transformations from examples
Proceedings of the VLDB Endowment
On indexing error-tolerant set containment
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
K-radius subgraph comparison for RDF data cleansing
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Learning phenotype mapping for integrating large genetic data
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Learning top-k transformation rules
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Efficient Privacy Preserving Protocols for Similarity Join
Transactions on Data Privacy
Learning to adapt cross language information extraction wrapper
Applied Intelligence
Hi-index | 0.00 |
Record linkage is the process of determining that two records refer to the same entity. A key subprocess is evaluating how well the individual fields, or attributes, of the records match each other. One approach to matching fields is to use hand-written domain-specific rules. This "expert systems" approach may result in good performance for specific applications, but it is not scalable. This paper describes a new machine learning approach that creates expert-like rules for field matching. In our approach, the relationship between two field values is described by a set of heterogeneous transformations. Previous machine learning methods used simple models to evaluate the distance between two fields. However, our approach enables more sophisticated relationships to be modeled, which better capture the complex domain specific, common-sense phenomena that humans use to judge similarity. We compare our approach to methods that rely on simpler homogeneous models in several domains. By modeling more complex relationships we produce more accurate results.