Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Privacy preserving association rule mining in vertically partitioned data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Secure and private sequence comparisons
Proceedings of the 2003 ACM workshop on Privacy in the electronic society
Privacy-preserving data integration and sharing
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Blocking-aware private record linkage
Proceedings of the 2nd international workshop on Information quality in information systems
Secure multiparty computation of approximations
ACM Transactions on Algorithms (TALG)
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Privacy preserving schema and data matching
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Data Quality and Record Linkage Techniques
Data Quality and Record Linkage Techniques
Fully homomorphic encryption using ideal lattices
Proceedings of the forty-first annual ACM symposium on Theory of computing
Similar Document Detection with Limited Information Disclosure
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A Hybrid Approach to Private Record Linkage
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Private Record Linkage
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Private record matching using differential privacy
Proceedings of the 13th International Conference on Extending Database Technology
HARRA: fast iterative hashed record linkage for large-scale data collections
Proceedings of the 13th International Conference on Extending Database Technology
Public-key cryptosystems based on composite degree residuosity classes
EUROCRYPT'99 Proceedings of the 17th international conference on Theory and application of cryptographic techniques
A constraint satisfaction cryptanalysis of bloom filters in private record linkage
PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Calibrating noise to sensitivity in private data analysis
TCC'06 Proceedings of the Third conference on Theory of Cryptography
Efficient Similarity Search over Encrypted Data
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Survey of clustering algorithms
IEEE Transactions on Neural Networks
LinkIT: privacy preserving record linkage and integration via transformations
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Efficient two-party private blocking based on sorted nearest neighborhood clustering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Mining frequent patterns with differential privacy
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The integration of information dispersed among multiple repositories is a crucial step for accurate data analysis in various domains. In support of this goal, it is critical to devise procedures for identifying similar records across distinct data sources. At the same time, to adhere to privacy regulations and policies, such procedures should protect the confidentiality of the individuals to whom the information corresponds. Various private record linkage (PRL) protocols have been proposed to achieve this goal, involving secure multi-party computation (SMC) and similarity preserving data transformation techniques. SMC methods provide secure and accurate solutions to the PRL problem, but are prohibitively expensive in practice, mainly due to excessive computational requirements. Data transformation techniques offer more practical solutions, but incur the cost of information leakage and false matches. In this paper, we introduce a novel model for practical PRL, which 1) affords controlled and limited information leakage, 2) avoids false matches resulting from data transformation. Initially, we partition the data sources into blocks to eliminate comparisons for records that are unlikely to match. Then, to identify matches, we apply an efficient SMC technique between the candidate record pairs. To enable efficiency and privacy, our model leaks a controlled amount of obfuscated data prior to the secure computations. Applied obfuscation relies on differential privacy which provides strong privacy guarantees against adversaries with arbitrary background knowledge. In addition, we illustrate the practical nature of our approach through an empirical analysis with data derived from public voter records.