The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TAILOR: A Record Linkage Tool Box
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploiting relationships for object consolidation
Proceedings of the 2nd international workshop on Information quality in information systems
Effective and scalable solutions for mixed and split citation problems in digital libraries
Proceedings of the 2nd international workshop on Information quality in information systems
Relational clustering for multi-type entity resolution
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
A Network Analysis Model for Disambiguation of Names in Lists
Computational & Mathematical Organization Theory
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
PRL: A probabilistic relational language
Machine Learning
ACM SIGKDD Explorations Newsletter
Email alias detection using social network analysis
Proceedings of the 3rd international workshop on Link discovery
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data integration: the teenage years
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Entity resolution in geospatial data integration
GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Adaptive graphical approach to entity resolution
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Proceedings of the 9th annual ACM international workshop on Web information and data management
Replica identification using genetic programming
Proceedings of the 2008 ACM symposium on Applied computing
Improving the accuracy of entity identification through refinement
Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
Rule based synonyms for entity extraction from noisy text
Proceedings of the second workshop on Analytics for noisy unstructured text data
Data & Knowledge Engineering
Structured machine learning: the next ten years
Machine Learning
Probabilistic Entity Linkage for Heterogeneous Information Spaces
CAiSE '08 Proceedings of the 20th international conference on Advanced Information Systems Engineering
A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Industry-scale duplicate detection
Proceedings of the VLDB Endowment
The impact of parameter setup on a genetic programming approach to record deduplication
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
Exploiting context analysis for combining multiple entity resolution systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Constraint-based entity matching
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Online collective entity resolution
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Linking social networks on the web with FOAF: a semantic web case study
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Journal of Artificial Intelligence Research
An integrated framework for de-identifying unstructured medical data
Data & Knowledge Engineering
Entity-aware query processing for heterogeneous data with uncertainty and correlations
Proceedings of the 2009 EDBT/ICDT Workshops
An incremental clustering scheme for data de-duplication
Data Mining and Knowledge Discovery
A graphical method for reference reconciliation
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
On-the-fly entity-aware query processing in the presence of linkage
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Entity resolution with evolving rules
Proceedings of the VLDB Endowment
Entity Resolution and Information Quality
Entity Resolution and Information Quality
Ontology and instance matching
Knowledge-driven multimedia information extraction and ontology evolution
Public record aggregation using semi-supervised entity resolution
Proceedings of the 13th International Conference on Artificial Intelligence and Law
Applied Intelligence
Object identification with attribute-mediated dependences
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Efficient semantic-aware detection of near duplicate resources
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Analysing social networks within bibliographical data
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
On the decidability and complexity of identity knowledge representation
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
An evolutionary approach to complex schema matching
Information Systems
Entity disambiguation in anonymized graphs using graph kernels
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Effective string processing and matching for author disambiguation
Proceedings of the 2013 KDD Cup 2013 Workshop
Data & Knowledge Engineering
Efficient entity matching using materialized lists
Information Sciences: an International Journal
Hi-index | 0.00 |
Record linkage, the problem of determining when two records refer to the same entity, has applications for both data cleaning (deduplication) and for integrating data from multiple sources. Traditional approaches use a similarity measure that compares tuples' attribute values; tuples with similarity scores above a certain threshold are declared to be matches. While this method can perform quite well in many domains, particularly domains where there is not a large amount of noise in the data, in some domains looking only at tuple values is not enough. By also examining the context of the tuple, i.e. the other tuples to which it is linked, we can come up with a more accurate linkage decision. But this additional accuracy comes at a price. In order to correctly find all duplicates, we may need to make multiple passes over the data; as linkages are discovered, they may in turn allow us to discover additional linkages. We present results that illustrate the power and feasibility of making use of join information when comparing records.