Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Record Linkage in Large Data Sets
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Privacy-Preserving String Comparisons in Record Linkage Systems: A Review
Information Security Journal: A Global Perspective
Efficient Private Record Linkage
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
HARRA: fast iterative hashed record linkage for large-scale data collections
Proceedings of the 13th International Conference on Extending Database Technology
Efficient Similarity Search over Encrypted Data
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection
A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication
IEEE Transactions on Knowledge and Data Engineering
A Sorted Neighborhood Approach to Multidimensional Privacy Preserving Blocking
ICDMW '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops
A taxonomy of privacy-preserving record linkage techniques
Information Systems
Hi-index | 0.00 |
Privacy Preserving Record Linkage (PPRL) is the scientific field that explores methods of linking datasets in order to identify common entities efficiently and accurately by simultaneously preserving the privacy of the underlying data. In this paper we present a distributed Locality Sensitive Hashing-based framework for linking huge collections of records, by grouping similar records efficiently and by distributing computations among underutilized commodity hardware resources uniformly, without imposing an extra overhead on the existing infrastructure, thus promoting scalability. We also propose two methods of assessing computational cost, aiming to distribute workload evenly among compute nodes.