The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A fast filtering scheme for large database cleansing
Proceedings of the eleventh international conference on Information and knowledge management
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient Record Linkage in Large Data Sets
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Privacy-preserving data integration and sharing
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Similarity-aware indexing for real-time entity resolution
Proceedings of the 18th ACM conference on Information and knowledge management
ACM SIGKDD Explorations Newsletter
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Hi-index | 0.00 |
We propose new data structures to speed up Record Linkage that take advantage of the value distribution of usual string attributes, like name or surname. Using some additional memory, we increase the processing speed by almost an order of magnitude without losing recall or precision at all. The improvement achieved is independent from the methods used for reducing the number of record comparisons, like Blocking or Sliding Window, and the specific string comparison functions.