The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficiency of a Good But Not Linear Set Union Algorithm
Journal of the ACM (JACM)
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A Bayesian decision model for cost optimal record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Record Linkage in Large Data Sets
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Extensible and Similarity-Based Grouping for Data Integration
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Disambiguating Web appearances of people in a social network
WWW '05 Proceedings of the 14th international conference on World Wide Web
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Object identification with attribute-mediated dependences
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Towards Machine Learning on the Semantic Web
Uncertainty Reasoning for the Semantic Web I
Data Quality Aware Queries in Collaborative Information Systems
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Generic Entity Resolution in Relational Databases
ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
Qualitative effects of knowledge rules and user feedback in probabilistic data integration
The VLDB Journal — The International Journal on Very Large Data Bases
Creating probabilistic databases from duplicated data
The VLDB Journal — The International Journal on Very Large Data Bases
Generic entity resolution with negative rules
The VLDB Journal — The International Journal on Very Large Data Bases
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
HARRA: fast iterative hashed record linkage for large-scale data collections
Proceedings of the 13th International Conference on Extending Database Technology
Interweaving OAI-PMH data sources with the linked data cloud
International Journal of Metadata, Semantics and Ontologies
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Multiple relationship based deduplication
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
From web data to entities and back
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Evaluating entity resolution results
Proceedings of the VLDB Endowment
Entity resolution with evolving rules
Proceedings of the VLDB Endowment
Data cleaning and query answering with matching dependencies and matching functions
Proceedings of the 14th International Conference on Database Theory
Entity Resolution and Information Quality
Entity Resolution and Information Quality
Staging a realistic entity resolution challenge for students
Journal of Computing Sciences in Colleges
Proceedings of the 4th International Workshop on Logic in Databases
Automatic threshold estimation for data matching applications
Information Sciences: an International Journal
Identity matching using personal and social identity features
Information Systems Frontiers
Foundations and Trends in Databases
We challenge you to certify your updates
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A framework for data quality aware query systems
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Public record aggregation using semi-supervised entity resolution
Proceedings of the 13th International Conference on Artificial Intelligence and Law
Matching unstructured product offers to structured product specifications
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable entity matching computation with materialization
Proceedings of the 20th ACM international conference on Information and knowledge management
Quality-aware similarity assessment for entity matching in Web data
Information Systems
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Towards certain fixes with editing rules and master data
The VLDB Journal — The International Journal on Very Large Data Bases
Linking records in dynamic world
PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
Flexible and efficient distributed resolution of large entities
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Information retrieval and deduplication for tourism recommender sightsplanner
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Information Visualization - Special issue on State of the Field and New Research Directions
Aggregating web offers to determine product prices
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting evidence from unstructured data to enhance master data management
Proceedings of the VLDB Endowment
Entity resolution: theory, practice & open challenges
Proceedings of the VLDB Endowment
Matching product titles using web-based enrichment
Proceedings of the 21st ACM international conference on Information and knowledge management
An automatic blocking mechanism for large-scale de-duplication tasks
Proceedings of the 21st ACM international conference on Information and knowledge management
Map to humans and reduce error: crowdsourcing for deduplication applied to digital libraries
Proceedings of the 21st ACM international conference on Information and knowledge management
Fast and accurate incremental entity resolution relative to an entity knowledge base
Proceedings of the 21st ACM international conference on Information and knowledge management
Tractable cases of clean query answering under entity resolution via matching dependencies
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Query rewriting using datalog for duplicate resolution
Datalog 2.0'12 Proceedings of the Second international conference on Datalog in Academia and Industry
Linking smart cities datasets with human computation: the case of urbanmatch
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
NADEEF: a commodity data cleaning system
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Enterprise context: a rich source of requirements for context-oriented programming
Proceedings of the 5th International Workshop on Context-Oriented Programming
Automation of data normalization for implementing master data management systems
Programming and Computing Software
Query-driven approach to entity resolution
Proceedings of the VLDB Endowment
Efficient entity matching using materialized lists
Information Sciences: an International Journal
Incremental entity resolution on rules and data
The VLDB Journal — The International Journal on Very Large Data Bases
Joint entity resolution on multiple datasets
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
We consider the entity resolution (ER) problem (also known as deduplication, or merge---purge), in which records determined to represent the same real-world entity are successively located and merged. We formalize the generic ER problem, treating the functions for comparing and merging records as black-boxes, which permits expressive and extensible ER solutions. We identify four important properties that, if satisfied by the match and merge functions, enable much more efficient ER algorithms. We develop three efficient ER algorithms: G-Swoosh for the case where the four properties do not hold, and R-Swoosh and F-Swoosh that exploit the four properties. F-Swoosh in addition assumes knowledge of the "features" (e.g., attributes) used by the match function. We experimentally evaluate the algorithms using comparison shopping data from Yahoo! Shopping and hotel information data from Yahoo! Travel. We also show that R-Swoosh (and F-Swoosh) can be used even when the four match and merge properties do not hold, if an "approximate" result is acceptable.