Adaptive sorted neighborhood methods for efficient record linkage
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Febrl: a freely available record linkage system with a graphical user interface
HDKM '08 Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust record linkage blocking using suffix arrays
Proceedings of the 18th ACM conference on Information and knowledge management
Similarity-aware indexing for real-time entity resolution
Proceedings of the 18th ACM conference on Information and knowledge management
ACM SIGKDD Explorations Newsletter
An efficient duplicate record detection using q-grams array inverted index
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Robust Record Linkage Blocking Using Suffix Arrays and Bloom Filters
ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient entity resolution for large heterogeneous information spaces
Proceedings of the fourth ACM international conference on Web search and data mining
A fast approach for parallel deduplication on multicore processors
Proceedings of the 2011 ACM Symposium on Applied Computing
A sequence labeling method using syntactical and textual patterns for record linkage
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Towards scalable real-time entity resolution using a similarity-aware inverted index approach
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
MFIBlocks: An effective blocking algorithm for entity resolution
Information Systems
A taxonomy of privacy-preserving record linkage techniques
Information Systems
Multi-source learning with block-wise missing data for Alzheimer's disease prediction
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Record linkage refers to techniques for identifying records associated with the same real-world entities. Record linkage is not only crucial in integrating multi-source databases that have been generated independently, but is also considered to be one of the key issues in integrating heterogeneous Web resources. However, when targeting large-scale data, the cost of enumerating all the possible linkages often becomes impracticably high. Based on this background, this paper proposes a fast and efficient method for linkage detection. The features of the proposed approach are: first, it exploits a suffix array structure that enables linkage detection using variable length n-grams. Second, it dynamically generates blocks of possibly associated records using 'blocking keys' extracted from already known reliable linkages. The results from our preliminary experiments where the proposed method was applied to the integration of four bibliographic databases, which scale up to more than 10 million records, are also reported in the paper.