The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Record linkage: making maximum use of the discriminating power of identifying information
Communications of the ACM
Automated name authority control
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Comparative study of name disambiguation problem using a scalable blocking-based framework
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
A Fast Linkage Detection Scheme for Multi-Source Information Integration
WIRI '05 Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Learning blocking schemes for record linkage
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Industry-scale duplicate detection
Proceedings of the VLDB Endowment
The Normalized Compression Distance as a Distance Measure in Entity Identification
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Robust record linkage blocking using suffix arrays
Proceedings of the 18th ACM conference on Information and knowledge management
Robust Record Linkage Blocking Using Suffix Arrays and Bloom Filters
ACM Transactions on Knowledge Discovery from Data (TKDD)
Eliminating the redundancy in blocking-based entity resolution methods
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Ontology and instance matching
Knowledge-driven multimedia information extraction and ontology evolution
Automatically generating data linkages using a domain-independent candidate selection approach
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Leveraging unlabeled data to scale blocking for record linkage
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Classification rule learning for data linking
Proceedings of the 2012 Joint EDBT/ICDT Workshops
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
Domain-Independent Entity Coreference for Linking Ontology Instances
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Towards scalable real-time entity resolution using a similarity-aware inverted index approach
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Accuracy vs. Speed: Scalable Entity Coreference on the Semantic Web with On-the-Fly Pruning
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Efficient two-party private blocking based on sorted nearest neighborhood clustering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.01 |
Traditionally, record linkage algorithms have played an important role in maintaining digital libraries - i.e., identifying matching citations or authors for consolidation in updating or integrating digital libraries. As such, a variety of record linkage algorithms have been developed and deployed successfully. Often, however, existing solutions have a set of parameters whose values are set by human experts off-lineand are fixed during the execution. Since finding the ideal values of such parameters is not straightforward, or no such single ideal value even exists, the applicability of existing solutions to new scenarios or domains is greatly hampered. To remedy this problem, we argue that one can achieve significant improvement by adaptively and dynamically changing such parameters of record linkage algorithms. To validate our hypothesis, we take a classical record linkage algorithm, the sorted neighborhood method (SNM), and demonstrate how we can achieve improved accuracy and performance by adaptively changing its fixed sliding window size. Our claim is analytically and empirically validated using both real and synthetic data sets of digital libraries and other domains.