The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
TAILOR: A Record Linkage Tool Box
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Joint deduplication of multiple record types in relational data
Proceedings of the 14th ACM international conference on Information and knowledge management
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Febrl: a freely available record linkage system with a graphical user interface
HDKM '08 Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80
SimEval: a tool for evaluating the quality of similarity functions
ER '07 Tutorials, posters, panels and industrial contributions at the 26th international conference on Conceptual modeling - Volume 83
Learning blocking schemes for record linkage
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Framework for evaluating clustering algorithms in duplicate detection
Proceedings of the VLDB Endowment
Learning-Based Approaches for Matching Web Data Entities
IEEE Internet Computing
Object identification with attribute-mediated dependences
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Decision models for record linkage
Data Mining
Entity Resolution and Information Quality
Entity Resolution and Information Quality
Benchmarking matching applications on the semantic Web
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
A secured collaborative model for data integration in life sciences
Transactions on large-scale data- and knowledge-centered systems IV
Scalable entity matching computation with materialization
Proceedings of the 20th ACM international conference on Information and knowledge management
Block-based load balancing for entity resolution with MapReduce
Proceedings of the 20th ACM international conference on Information and knowledge management
Learning-based entity resolution with MapReduce
Proceedings of the third international workshop on Cloud data management
Multi-pass sorted neighborhood blocking with MapReduce
Computer Science - Research and Development
SC spectra: a linear-time soft cardinality approximation for text comparison
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Flexible and efficient distributed resolution of large entities
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Tailoring entity resolution for matching product offers
Proceedings of the 15th International Conference on Extending Database Technology
CrowdER: crowdsourcing entity resolution
Proceedings of the VLDB Endowment
Dedoop: efficient deduplication with Hadoop
Proceedings of the VLDB Endowment
Entity resolution: theory, practice & open challenges
Proceedings of the VLDB Endowment
Matching product titles using web-based enrichment
Proceedings of the 21st ACM international conference on Information and knowledge management
Scaling multiple-source entity resolution using statistically efficient transfer learning
Proceedings of the 21st ACM international conference on Information and knowledge management
Automatic SLA Matching and Provider Selection in Grid and Cloud Computing Markets
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Proceedings of the sixth ACM international conference on Web search and data mining
Knowledge harvesting in the big-data era
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Semantic similarity measurement using historical google search patterns
Information Systems Frontiers
WOO: a scalable and multi-tenant platform for continuous knowledge base synthesis
Proceedings of the VLDB Endowment
Efficient entity matching using materialized lists
Information Sciences: an International Journal
Joint entity resolution on multiple datasets
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.01 |
Despite the huge amount of recent research efforts on entity resolution (matching) there has not yet been a comparative evaluation on the relative effectiveness and efficiency of alternate approaches. We therefore present such an evaluation of existing implementations on challenging real-world match tasks. We consider approaches both with and without using machine learning to find suitable parameterization and combination of similarity functions. In addition to approaches from the research community we also consider a state-of-the-art commercial entity resolution implementation. Our results indicate significant quality and efficiency differences between different approaches. We also find that some challenging resolution tasks such as matching product entities from online shops are not sufficiently solved with conventional approaches based on the similarity of attribute values.