Applied cryptography (2nd ed.): protocols, algorithms, and source code in C
Applied cryptography (2nd ed.): protocols, algorithms, and source code in C
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Computation of Normalized Edit Distance and Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
Properties of Embedding Methods for Similarity Searching in Metric Spaces
IEEE Transactions on Pattern Analysis and Machine Intelligence
Information sharing across private databases
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Practical Techniques for Searches on Encrypted Data
SP '00 Proceedings of the 2000 IEEE Symposium on Security and Privacy
Secure and private sequence comparisons
Proceedings of the 2003 ACM workshop on Privacy in the electronic society
Enabling sovereign information sharing using Web Services
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Privacy-preserving data integration and sharing
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Blocking-aware private record linkage
Proceedings of the 2nd international workshop on Information quality in information systems
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Statistical Matching: Theory and Practice (Wiley Series in Survey Methodology)
Statistical Matching: Theory and Practice (Wiley Series in Survey Methodology)
Privacy preserving schema and data matching
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Privacy preserving clustering on horizontally partitioned data
Data & Knowledge Engineering
Privacy-Preserving String Comparisons in Record Linkage Systems: A Review
Information Security Journal: A Global Perspective
Accurate Synthetic Generation of Realistic Personal Information
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Data Quality and Record Linkage Techniques
Data Quality and Record Linkage Techniques
Formal anonymity models for efficient privacy-preserving joins
Data & Knowledge Engineering
Privacy Preserving Record Linkage Using Phonetic Codes
BCI '09 Proceedings of the 2009 Fourth Balkan Conference in Informatics
Private record matching using differential privacy
Proceedings of the 13th International Conference on Extending Database Technology
Efficient privacy preserving distributed clustering based on secret sharing
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Cryptography and Network Security: Principles and Practice
Cryptography and Network Security: Principles and Practice
Privacy-preserving record linkage
PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
IEEE Spectrum
Information fusion in data privacy: A survey
Information Fusion
Fake injection strategies for private phonetic matching
DPM'11 Proceedings of the 6th international conference, and 4th international conference on Data Privacy Management and Autonomous Spontaneus Security
Efficient privacy-aware record integration
Proceedings of the 16th International Conference on Extending Database Technology
A taxonomy of privacy-preserving record linkage techniques
Information Systems
An efficient two-party protocol for approximate matching in private record linkage
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
An iterative two-party protocol for scalable privacy-preserving record linkage
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
Record linkage is the task of identifying records from disparate data sources that refer to the same entity. It is an integral component of data processing in distributed settings, where the integration of information from multiple sources can prevent duplication and enrich overall data quality, thus enabling more detailed and correct analysis. Privacy-preserving record linkage (PPRL) is a variant of the task in which data owners wish to perform linkage without revealing identifiers associated with the records. This task is desirable in various domains, including healthcare, where it may not be possible to reveal patient identity due to confidentiality requirements, and in business, where it could be disadvantageous to divulge customers' identities. To perform PPRL, it is necessary to apply string comparators that function in the privacy-preserving space. A number of privacy-preserving string comparators (PPSCs) have been proposed, but little research has compared them in the context of a real record linkage application. This paper performs a principled and comprehensive evaluation of six PPSCs in terms of three key properties: (1) correctness of record linkage predictions, (2) computational complexity, and (3) security. We utilize a real publicly-available dataset, derived from the North Carolina voter registration database, to evaluate the tradeoffs between the aforementioned properties. Among our results, we find that PPSCs that partition, encode, and compare strings yield highly accurate record linkage results. However, as a tradeoff, we observe that such PPSCs are less secure than those that map and compare strings in a reduced dimensional space.