A critical investigation of recall and precision as measures of retrieval system performance
ACM Transactions on Information Systems (TOIS)
Applied cryptography (2nd ed.): protocols, algorithms, and source code in C
Applied cryptography (2nd ed.): protocols, algorithms, and source code in C
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Information sharing across private databases
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Practical Techniques for Searches on Encrypted Data
SP '00 Proceedings of the 2000 IEEE Symposium on Security and Privacy
Secure and private sequence comparisons
Proceedings of the 2003 ACM workshop on Privacy in the electronic society
Privacy-preserving data linkage protocols
Proceedings of the 2004 ACM workshop on Privacy in the electronic society
Blocking-aware private record linkage
Proceedings of the 2nd international workshop on Information quality in information systems
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Privacy preserving schema and data matching
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A Comparison of Personal Name Matching: Techniques and Practical Issues
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Privacy-Preserving Data Linkage and Geocoding: Current Approaches and Research Directions
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Privacy-Preserving String Comparisons in Record Linkage Systems: A Review
Information Security Journal: A Global Perspective
Data Quality and Record Linkage Techniques
Data Quality and Record Linkage Techniques
A Hybrid Approach to Private Record Linkage
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Private Record Linkage
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Privacy Preserving Record Linkage Using Phonetic Codes
BCI '09 Proceedings of the 2009 Fourth Balkan Conference in Informatics
Private record matching using differential privacy
Proceedings of the 13th International Conference on Extending Database Technology
Privacy-preserving record linkage
PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication
IEEE Transactions on Knowledge and Data Engineering
Efficient two-party private blocking based on sorted nearest neighborhood clustering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
An iterative two-party protocol for scalable privacy-preserving record linkage
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
The task of linking multiple databases with the aim to identify records that refer to the same entity is occurring increasingly in many application areas. If unique identifiers for the entities are not available in all the databases to be linked, techniques that calculate approximate similarities between records must be used for the identification of matching pairs of records. Often, the records to be linked contain personal information such as names and addresses. In many applications, the exchange of attribute values that contain such personal details between organisations is not allowed due to privacy concerns. The linking of records between databases without revealing the actual attribute values in these records is the research problem known as 'privacy-preserving record linkage' (PPRL). While various approaches have been proposed to deal with privacy within the record linkage process, a viable solution that is well applicable to real-world conditions needs to address the major aspect of scalability of linking very large databases while preserving security and linkage quality. We propose a novel two-party protocol for PPRL that addresses scalability, security and quality/accuracy. The protocol is based on (1) the use of reference values that are available to both database owners, and allows them to individually calculate the similarities between their attribute values and the reference values; and (2) the binning of these calculated similarity values to allow their secure exchange between the two database owners. Experiments on a real-world database with nearly two million records yield linkage results that have a linear scalability to large databases and high linkage accuracy, allowing for approximate matching in the privacy-preserving context. Since the protocol has a low computational burden and allows quality approximate matching while still preserving the privacy of the databases that are matched, the protocol can be useful for many real-world applications requiring PPRL.